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CONSENSUS CONFIGURATIONS, BIAS MONTE 

CARLO METHOD AND SYSTEM FOR 
PHARMACOPHORE STRUCTURE PFTERMTNATION 

This specification includes in Sec. 8 computer program 
listings that are exemplary embodiments of the computer 
programs of this invention. 

A portion of the disclosure of this patent document 
contains material which is subject to copyright protection. 
The copyright owner has no objection to the facsimile 
reproduction by any one of the patent disclosure, as it 
appears in the Patent and Trademark Office patent files and 
records, but otherwise reserves all copyright rights 
whatsoever. 

This invention was made with Government support under 
Grant number 1R43CA62752-01 awarded by the National 
Institutes of Health. The Government has certain rights in 
the invention. 

!• FIELD OF THE INVENT TOM 
The field of this invention is computer assisted methods 
of drug design. More particularly the field of this 
invention is computer implemented smart Monte Carlo methods 
which utilize NMR and binders to a target of interest as 
inputs to determine highly accurate molecular structures that 
must be possessed by a drug in order to achieve an effect of 
interest. Illustrative U.S. Patents are 5,331,573 to Balaji 
et al., 5,307,287 to Cramer, III et al., 5,241,470 to Lee at 
al., and 5,265,030 to Skolnick et al. 



15 



20 



30 

2. BACK 



Protein interactions have recently emerged as a 
fundamental target for pharmacological intervention. For 
example, the top two major uncured diseases in the United 
States are atherosclerosis (the principal cause of heart 
attack and stroke) and cancer. These diseases are 
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responsible for greater than 50% of all U.S. mortality and 
cost the U.S. economy over $200 billion per year. A 
consistent picture of these diseases, which has gradually 
emerged during the past ten years of molecular biological and 
5 medical research, views both as triggered by disordering of 
specific molecular recognition events that take place among 
sets of proteins present in both the normal and disease 
states. 

Hierarchical, organized patterns of protein-protein 

10 interactions are often referred to as "pathways" or 
"cascades." At the molecular level, cancers have been 
determined to be the deregulation of pathways of interacting 
proteins responsible for guiding cellular growth and 
differentiation. During the past year, individual cellular 

15 events have been organized into nearly complete mechanistic 
explanations of how a cell's behavior is controlled by its 
environment and how communication pathway errors lead to 
uncontrolled proliferation and cancer. Disruption in similar 
pathways are responsible for the proliferation of blood 

20 vessel walls marking the atherosclerotic disease state (Cook 
et al., 1994, Nature 369:361-362; Hall, 1994, Science 
264:1413-1414; Ross, 1993, Nature 362:801-809; Zhang et al . , 
1993, Nature 364:308-313) . 

Inhibition or stimulation of particular protein- 

25 substrate interactions have long been known drug targets. 
Many important ant i- hypertensives, neurotransmitter 
analogues, antibiotics, and chemotherapeutic agents act in 
this fashion. Captopril, an antihypertensive drug, was 
designed based on its ability to antagonize a focal blood- 

30 pres sure - regulating enzyme. 

Proteins involved in biological processes , either as 
part of protein-protein pathways x>r as enzymes, are composed 
of domains (Campbell et al., 1994, Trend. BioTech. 
12:168-172; Rothberg et al., 1992, J. Mol. Biol. 

35 227:367-370). Domains, or regions of the protein of stable 
three dimensional (secondary and tertiary) structures, play 
several major roles, including providing on their surface 
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small regions ( M examples of targets"), where proteins and 
substrates are able to bind and interact, and functioning as 
structural units holding other domains together as part of a 
large protein (tertiary and quaternary structure) . The 
5 interaction surface of a domain or target is fundamental to 
determining binding specificity. Targets are often small 
enough that the principal contribution to the binding energy 
is short range, highly localized to several amino acids 
(Wells, 1994, Curr. Op. Cell Biol. 6:163-174). The 

10 functional specificity of targets and domains, responsible 
for the incredible diversity of cellular function, ultimately 
rests with the arrangement of amino acid side chains forming 
their interaction surfaces, or targets (Marengere et al . , 
1994, Nature 369:502-505). 

15 It can be appreciated, therefore, that pharmacological 

intervention affecting the specific protein-protein and 
protein-substrate recognition events occurring at protein 
targets is of fundamental importance, particularly for 
effective drug design. 

20 However, achieving desired pharmacological interventions 

in a predictable manner remains as elusive as ever. Early 
approaches to drug design depended on the chance observation 
of biological effects of a known compound or the screening of 
large numbers of exotic compounds, usually derived from 

25 natural sources, for any biological effects. The nature of 
the actual protein target was usually unknown. 

2.1- TARGET STRUCTURE -BASED 

APPROACHES TO DRUG DESIGN 

30 Rational approaches to drug design. have met with only 

limited success. Current rational approaches are based on 
first determining the entire structure of the proteins 
involved in particular interactions, examining this structure 
for the possible targets, and then predicting possible drug 

35 molecules likely to bind to the possible target. Thus the 
location of each of the thousands of atoms in a protein must 
be accurately determined before drug design can begin. 
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Direct experimental and indirect computational methods for 
protein structure determination are in current use. However, 
none of these methods appears to be sufficiently accurate for 
drug design purposes according to current rational 
5 approaches. 

The primary direct experimental methods for determining 
the structure of proteins involved in particular interactions 
are X-ray crystallography, relying on the interaction of 
electron clouds with X-rays, and liquid nuclear magnetic 

10 resonance (NMR) , relying on correlations between polarized 
nuclear spins interacting via indirect dipole-dipole 
interactions. X-ray methods provide information on the 
location of every heavy atom in a crystal of interest 
accurate to 0.5-2.0 A (1 A = 10 s cm). Drawbacks of x-ray 

15 methods include difficulties in obtaining high-quality 
crystals, expense and time associated with the 
crystallization process, and difficulties in resolving 
whether or not the structure of the crystalline forms is 
representative of the in vivo conformation (Clore et al., 

20 1991, J. Mol. Biol. 221:47; Shaanan et al., 1992, Science 
227:961-964). High resolution, multidimensional, liquid 
phase NMR techniques represent an attractive alternative, to 
the extent that they can be applied in situ {i.e., in aqueous 
environment) to the study of small protein domains (Yu et 

25 al., 1994, Cell 76:933-945). However, the complexity of the 
analysis of the various mutual correlations is time 
consuming, and the correlations (primarily from the nuclear 
Overhausser effect) provide no better accuracy than X-ray 
methods. Isotopic enrichment of proteins with l, C and W N 

30 reduces the time associated with analysis, but at a great 
expense (Anglister et al., 1993 , Frontiers of NMR in Biology 
III L2011) . 

Protein structures determined by any of these current 
methods do not predict success in subsequent drug design. 
35 Resolution obtainable either by measurement or computation, 
generally 0.5-2 A, has often been found to be inadequate for 
effective direct drug design, or for selection of a lead 
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compound from organic compound libraries. The resolution 
required to understand both drug affinity and drug 
specificity, although not precisely known, is probably 
measured in fractions of an A, down to 0.1 A (MacArthur et 
5 al., 1994, Trend. BioTech. 12:149-153). This accuracy 
appears to be beyond the capabilities of many current 
methodologies . 

Prior research has identified tools which, although 
promising, cannot be used in a coordinated manner for drug 

10 design. One promising measurement approach with speed, 
simplicity, accuracy, and the ability to carefully control 
the measurement environment is rotational echo double 
resonance (REDOR) NMR, a type of solid state NMR (Guillion 
and Schaefer, 19B9, J. Magnetic Resonance 81:196; Holl et 

15 al., 1990, J. Magnetic Resonance 81:620-626 and McWherter, 
1993, J. Am. Chem. Soc. 115:238-244). REDOR accuracy can be 
below the 0.1 A believed to be sufficient for direct drug 
design: However, since REDOR measures only a few selected 
distances, it is not usable in drug design methods which 

20 depend on the initial determination of the complete structure 
of the protein containing the target of interest. 

Once a target's structure is determined by the above 
methods, most rational drug design paradigms call for the 
prediction of small drug structures that will bind (or dock) 

25 to the target. This prediction is generally done by 

computational methods, of which several are in current use. 
Most seek to predict the position of all the thousands of 
atoms in a drug structure. Purely ab initio computational 
approaches to high resolution structure analysis, such as 

30 quantum statistical mechanics and molecular dynamics, require 
prohibitive computing resources. To apply either approach, 
the potential energy, or Hamiltonian, of the entire system 
must be known. Statistical mechanics provides an expression 
for the probability of any given protein configuration as a 

35 ratio of partition functions. Proper quantum statistical 
mechanics required for an exact evaluation of full protein 
partition functions is not currently computationally 

- 5 - 
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feasible, as it would involve many thousands of atoms 
including the target, the protein, and the aqueous 
environment. The application of even simple, approximate 
quantum statistical mechanics to simple systems in aqueous 
5 environments is currently a non-trivial task (Chandler, 1991, 
in Liquids. Freezin g, and Glass Transitions . Elsevier, NY, p. 
195) . Molecular dynamics computes the dynamics of a 
molecule's motion in time. Computing the atomic dynamics of 
all the perhaps thousands atoms of a protein is an extreme 

10 computational burden. Only picoseconds, or at most a few 
nanoseconds, of molecular time can be simulated, which is 
insufficient to determine a high resolution, equilibrium, 
structure (Smit et al . , 1994, J. Phys . Chem. 98:8442-8452). 
In any case, most of the information determined is wasted, 

15 since only the structure of the protein binding target are of 
interest in drug design. 

Further, current approximate computational techniques 
for protein structure determination are in need of greater 
accuracy or efficiency. The most common techniques depend on 

20 Molecular Dynamics or Monte Carlo methods (Nikif orovich, 
1994, Int. J. Peptide Protein Res. 44:513-531; Brunger and 
Karplus, 1991, Acc. Chem. Res. 24:54-61). These methods 
randomly alter initial molecular structures by generating 
simulated thermal perturbations, and then average the 

25 ensemble of results to determine a final structure. The 
generated perturbation must preserve all structural 
constraints and be energetically favorable. If both 
conditions are not met, the perturbation will be discarded. 
Current Monte Carlo methods applied to constrained protein 

30 structure determinations productively use only approximately 
1 out of 10 s perturbed structures generated (Siepmann et al., 
1993 , Nature 365:330-332). This extreme waste of computer 
resources results in time consuming, low resolution structure 
determinations . 

35 To summarize, existing rational drug design methods 

based on identification of target structure fail to reliably 
yield drug molecules due to experimental structure 
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determination difficulties and computational difficulties 
associated with predicting drug structures with ill -defined 
Hamiltonians. 

5 2.2. DIVERSITY -BASED APPROACHES TO DRUG DESIGN 

Another method for exploring protein target interactions 
utilizes "recognition systems" which comprise huge libraries 
of related molecules (Clarkson et al., 1994, Trend. BioTech. 
12:173-184) . From such a library only those members binding 

10 to the target of interest are selected. Such recognition 
systems must encompass the structural diversity of protein 
targets while being amenable to serve for the selection of 
lead compounds for drug design. Antibodies are one classic 
example of such a system that certainly meets the recognition 

15 requirement. Unfortunately , there is a need to determine the 
antibody structures needed for lead compound selection more 
rapidly and accurately. While about 2000 recognition regions 
have been sequenced, only about 23 in the Brookhaven Protein 
Structural Database have structures determined to even within 

20 2 A (Rees et al. f 1994, Trends in Biotech. 12:199-206). 

Promising recognition systems at the opposite extreme 
comprise huge libraries of small peptides. The small 
peptides must be sufficiently diverse so that they attain a 
level of affinity and specificity similar to that obtained by 

25 protein domains. Given the role peptides play in nature, 
this condition can be met by surprisingly small structures, 
with 6 to 12 amino acids. However, linear peptides are either 
unstructured or weakly structured at room temperature in 
aqueous solutions (Alberg et al., 1993, Science 262:248; 

30 Skalicky et al. f 1993, Protein Science 10:1591-1603). From a 
practical viewpoint, linear peptides must be constrained to 
reduce their degrees of freedom (reduced conformational 
entropy) and to increase their chances for strongly binding . 
These constraints, or scaffolds, limit the range of stable 

35 conformations and make more straightforward determining bound 
structure (Olivera et al., 1990, Science 249:259; Tidor et 
al., 1993, Proteins: Structure Function and Genetics 15:71). 



- 7 - 



WO 96/30849 



PCT/US96/04229 



Methods are now available to create such libraries and 
to select library members that recognize a specific protein 
target. The production of constrained peptide diversity 
libraries requires synthesizing oligonucleotides with the 
5 desired degeneracy to code for the peptides and ligating them 
into selection vectors (Goldman et al., 1994, Bio/Tech. 
10:1557-1561). Once a constrained structured diversity 
library is created, it is a source from which to select 
specific members that bind to a target of interest. Beginning 

10 with a known pathway involving specific domain-domain or 
protein-substrate interactions at a target, molecular 
biological methods can be used to identify in a matter of 
days small ensembles of highly constrained peptides from 
these huge libraries that bind to these domains with high 

15 affinity and specificity. 

While this field has been exploding in the last few 
years and showing great potential, it is severely limited by 
its use in isolation without the benefit of integrated 
structural analysis needed both to derive the high resolution 

20 structures of binding peptides and also to direct the 

construction of additional structured libraries. Drug design 
is not aided by having library members recognizing the 
protein target of interest but without any understanding of 
why the recognition occurs. This is entirely similar to the 

25 random screening methods of early fortuitous drug design 
efforts . 

Unfortunately, rational drug design according to current 
approaches (target structure-based) remains an inefficient, 
laborious process with a disproportionately high lead- 

30 compound failure rate. Presently, about 90% of lead 

compounds fail to emerge successfully from clinical trials 
(Trends in U.S. Pharmaceutical Sales and Research and 
Development, Pharmaceutical Manufacturing Association, 
Washington, D.C., 1993). 

35 It is becoming clear that low- resolution structures of 

an entire protein or target (at 0.5-2 A), or an 
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uncharacterized lead, such as produced by chemical diversity 
methods, leave much to be desired for use in drug design. 

If the limitations of prior art methods were overcome 
and a sufficiently accurate structure needed by a molecule to 
5 bind to a target of interest could be determined, existing 
chemical libraries could be searched for highly targeted lead 
compounds with similar structure (Martin, 1992, J, Medicinal 
Chem. 35:2145-2154). This database search can be based not 
only on chemical and electronic properties, but also on 

10 geometric information. Such searches that have high 
resolution (better than 0.25 A), would provide a vast 
improvement over the prior art, as lower resolutions lead to 
an exponentially increasing number of potential leads. 

Computational methods to determine high resolution drug 

15 structures from recognition system binding information or NMR 
partial distance measurements are not currently available. 
No current structure determination methods uses such 
additional information to make more efficient or more 
accurate determination of high resolution structures 

20 (Holzman, 1994, Amer. Sci . 872:267). 

Citation of a reference or discussion hereinabove shall 
not be construed as an admission that such is prior art to 
the present invention* 

25 3. SUMMARY OF THE INVENTION 

It is a broad object of this invention to address the 
prior art problems of drug design by providing a method of 
rational design of drugs that achieve their effect by binding 
to a target molecule or molecular complex of interest. 

30 Importantly, this object is achieved without requiring 

determination of the structure of the molecule or molecular 
complex ("target molecule") bearing the target or even of the 
target itself. The method is target structure independent. 
The method of the invention uses an interdisciplinary 

35 combination of computational modeling and simulation, 

experimental distance constraints, and molecular biology. 
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In an important aspect, the invention provides a 
computer implemented modeling and simulation method to 
determine a highly accurate consensus structure for the 
pharmacophore and a structure for the remainder of the 
5 molecule from diversity library members that bind_ to the 
protein target of interest. Where prior structure 
determination methods focused on the structure of the target 
molecule or of the target, the method of this invention is 
uniquely adapted to focus instead on the str ucture s of 

10 molecules that bind to the target . Such structural 

information is directly applicable to drug design since it 
defines the structure a drug must possess to bind to the 
target of interest. Also, this structural information is 
much easier to determine by use of the present invention, 

15 since it concerns molecules with many fewer atoms than the 
target molecule. The method of the invention achieves 
accuracy by improving upon the accuracy and utility of the 
input structural information. In a further embodiment of the 
invention, the method employed for structural determination 

20 is a smart Monte Carlo technique adapted to small constrained 
molecules . 

The structure determination method of the invention 
allows one to take maximum advantage of the information 
obtained from the molecular biological selection of the 

25 diversity library members that tightly and specifically bind 
to the target molecule of interest. The selected library 
members must share some common structure to bind to the same 
target molecule. The smart Monte Carlo computer method of 
this invention specifically seeks and provides this common 

30 structure. 

The invention also provides a method of performing REDOR 
NMR measurements of molecules o n a solid phase substrate . In 
a preferred embodiment, the substrate is a solid phase on 
which the molecule (e.g., peptide) has been synthesized, with 
35 a high degree of purity. In another preferred embodiment, 
performing REDOR measurements of such a molecule on a 
substrate can be done in a dry nitrogen atmosphere, under 
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hydrated conditions, and when the molecule is either free or 
bound to a target. In a specific embodiment, the REDOR 
measurements are accurate to better than 0.05 A from 0 to 4 
A, and to better than 0.1 A from 4 to 8 A. In an 
5 advantageous aspect of the invention, the structure 
determination method makes maximum use of these highly 
accurate internuclear distance measurements to constrain the 
determined common structure for the binding library members. 
The invention also provides methods of identifying a 

10 compound that specifically binds to a target molecule, by 

first screening a diversity library, and then using a genetic 
selection method for screening the compounds identified from 
the diversity library. 

In broad aspects, the invention provides a method and 

15 apparatus for rational and predictable design of new and/or 
improved drugs that achieve their effect by binding to a 
specified target molecule. More particularly, the invention 
is directed to a method for the rational selection of highly 
specific lead compounds for such drug design, including the 

20 computer implemented step of highly accurate determination of 
the structure responsible for this target binding by the 
highly accurate, consensus, conf igurational bias Monte Carlo 
method . 

A lead compound serves as a starting point for drug 
25 development both because it specifically binds to the protein 
target of interest, achieving the biological effect of 
interest, and because it has or can be modified to have good 
pharmacokinetics and medicinal applicability. A final drug 
may be the lead compound or may be derived therefrom by 
30 modifying the lead to maximize beneficial effects and 

minimize harmful side-effects. Although any lead compound is 
useful, a lead that tightly and specifically binds to the 
target molecule of interest in a known manner, such as can be 
provided by the invention, is of great use. Knowledge of the 
35 high resolution structures in a lead compound responsible for - 
its binding and activity provides a more focused and 
efficient drug development process. 
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The methods of the invention improve lead compound 
determination, by determining the "pharmacophore", the 
precise structural characteristics needed for a lead compound 
to specifically bind to a target of interest. The most 
5 fundamental specification of a pharmacophore is in terms of 
the electronic properties necessary for a molecule to 
specifically bind to the surface of a target molecule. These 
properties may be fundamentally represented by requirements 
on the ground and low lying excited state wave functions of a 

10 pharmacophore, such as, for example, by specifying 

requirements on the well known multiple expansion of these 
wave functions. 

The preferred pharmacophore specification according to 
the invention is in terms of both the chemical groups making 

15 up the pharmacophore and determining its electronic 

properties and also the geometric relationships of these 
groups. This chemical representation is not the only 
possible representation of the pharmacophore. Several 
chemical arrangements may have similar electronic properties. 

20 For example, if a pharmacophore specification included an -OH 
group at a particular position,, a substantially equivalent 
specification might include an -SH group at the same 
position. Equivalent chemical groups that may be substituted 
in a pharmacophore specification without substantially 

25 changing its nature are called "homologous". 

In particular embodiments, therefore, this invention 
provides a method and apparatus for the highly accurate 
determination of the pharmacophore needed to specifically 
bind to the target molecule of interest, by a specification 

30 of the geometric relationships of the important chemical 
groups. The pharmacophore is preferably determined by a 
smart Monte Carlo method from molecular biological input 
s pecifying molecules (preferably selected from among 
diversity libraries) t hat specifically bind to the target 

35 m olecule and also pref erably from REDOR NMR data specifying a 
few highly accurate distances in these selected molecules. 
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An important advantage provided by the invention is the 
ability to make a pharmacophore structure determination 
without relying on any knowledge of the structure of the 
target molecule or target, where the target molecule is a 
5 protein, conventional prior art methods have sought to 
sequence and determine the structure of the protein 
containing the target, hoping thereby to determine active 
sites by examination of the structure. A further important 
advantage of the invention is that this structure 

10 determination can be made by use of a relatively small number 
of actual physical position measurements. In contrast , 
conventional methods using X-ray crystallography and liquid 
NKR require determination of positions of all atoms in the 
molecule {"binder") that specifically binds to the target, 

15 and the target. An additional advantage provided by the 
invention is that, in a preferred embodiment wherein REDOR 
structural measurements provide input information, the 
accuracy of the pharmacophore structure determination can be 
at least approximately 0.25-0.50 A or better. This accuracy 

20 is provided by the combination of an efficient, Monte Carlo 
technique for structure determination with a few highly 
accurate distance determinations. 

4* BRIEF DESCRIPTION OF THE DRAWINGS 
25 These and other features, aspects, and advantages of the 

present invention will become better understood by reference 
to the accompanying drawings, following description, and 
appended claims, where: 

Fig, 1 is the overall method of this invention in its 
30 broadest aspect; 1 

Fig. 2A and 2B are more detail for. the step of Fig. l 
for selecting candidate pharmacophore structures; 

Fig. 3 is more detail for the step of Fig. 1 for 
preforming distance measurements; 
35 Fig. 4 is more detail for the step of Fig. 3 for 

performing NMR measurements; 
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Fig. 5 is REDOR NMR signal response details for step of 
Fig. 3 of data analysis; 

Fig. 6 is sample REDOR NMR spectra according to the 
method of Fig. 3; 
5 Fig. 7 is sample data analysis according to the method 

of Fig. 3; 

Fig. 8 is more detail for the Step of Fig. 1 for 
conf igurational bias Monte Carlo structure determination; 
Fig. 9 is a sample of simulation completion data; 
10 Fig. 10 is further detail of peptide memory 

representation used in the method of Fig. 8; 

Fig. 11 is additional detail of peptide memory 
representation used in the method of Fig. 8; 

Fig. 12 is more detail for the step of Fig. 8 of 
15 processor generation of proposed modified structures by Type 

I moves; 

Fig. 13 is more detail for the step of Fig. 8 of 
processor generation of proposed modified structures by Type 

II moves; 

20 Fig. 14 is additional detail for the step of Fig. 8 of 

processor generation of proposed modified structures by Type 
II moves; 

Fig. 15 is a structure for implementing the method of 
Fig. 8; 

25 Fig. 16 is the main program structure of Fig. 15; 

Fig. 17 is the structure modification program structure 
of Fig. 15; 

Fig. 18A and 18B are the Type I move generator program 
structure of Fig. 17; 
30 Fig. 19A and 19B are the Type II move generator program 

structure of Fig. 17. 

5. DETAILED DESCRIPTION 
For clarity of disclosure, and not by way of limitation, 
35 the detailed description of the invention is described as a 
series of steps. A broad view of the exemplary steps of 
which the invention is comprised is presented in Fig. 1, a 
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brief overview of which is presented in the text that 
follows. 

The invention method preferably begins with a target 
molecule {or molecular complex) 1 having a binding target of 
5 biological or pharmacological interest- Specific binding of 
a molecule to the target is predicted to affect its 
biological activity and may provide biological effects of 
interest. For example, these effects might include 
amelioration of a disease process or alteration of a 

10 physiological response. Lead compounds 8 output from the 
invention are able to specifically bind to target molecule 1 
and can serve as starting points for the design of a drug 
able to specifically bind to the target. 

Diversity library screening, step 2, allows the 

15 selection from among library members of a plurality of 

molecules [hereinafter called "binders"] that specifically 
bind to target molecule (or molecular complex) 1; the 
chemical building block structure (e.g., sequence, structural 
formula) is then determined. If predetermined binders and 

20 their structure are already available, the invention can use 
this information directly without the need for library 
screening. If library screening is done, one or more 
libraries may be screened. The selected binders all share a 
common pharmacophore structure, allowing their specific 

25 bindi ng to the target in a chemically and physically similar 
manner. This common structure is preferably iteratively 
determined by a select and test method. Candidate 
pharmacophore selection, step 3, is based upon chemical 
structure homologies . Geometric and conformational 

30 information is not needed to be used at this step and is 

preferably not considered. A candidate pharmacophore shared 
by all the N binders is selected, step 3, for structure 
determination by subsequent steps. The binders will 
typically present several candidate chemical pharmacophores, 

35 ignoring conformation considerations. These candidates are 
small groups of library building blocks, often contiguous, 
each candidate group in one binder being homologous to the 
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candidate groups in all the other binders . Building block 
homologies are determined by applying rules appropriate to 
the diversity library. In the preferred embodiment, 
homologous building blocks have similar surface chemical 
5 groups, since pharmacophores are defined by a similar 

geometric arrangement of chemical structures. In' the case of 
the preferred library, CX e C, candidate pharmacophores are 
amino acid sequences whose side chain surface groups have 
similar chemical properties. Amino acid homologies are 
10 determined by mechanical rules described below. These 

candidate sequences are typically 3 amino acids long, but may 
range from 2 all the way to 6. Where pharmacophores are 
defined by their charge distributions, homologous library 
building blocks must have similar charge distributions. 
15 Having selected N binders by screening one or more 

libraries and determined a candidate pharmacophore in each 
binder, the subsequent steps of distance measurement, step 4, 
and Monte Carlo structure determination, step 5, determine a 
highly accurate structure for the candidate pharmacophore, if 
20 possible. This determination will be possible if the 

candidate is the actual pharmacophore. A subsequent test, 
step 6, checks for success of this structure determination. 
In particular cases, distance measurements may not be 
necessary in order to determine an adequately precise 
25 pharmacophore structure. 

Measurements are made # step 4, of a few strategic 
distances in the binders, that will be most useful for the 
subsequent structure determination step. A minimum number of 
strategic interatomic distances in the binders are measured 
30 in step 4. These few distances constrain possible binder 
structures and make the subsequent complete structure 
determination more efficient and more accurate. In preferred 
but not limiting embodiments, measurement methods yielding 
distances accurate to at least approximately 0.25 A or less 
35 are used. The preferred methods use nuclear magnetic 

resonance [ W NMR W ] techniques. Particularly preferred is the 
rotational -echo double resonance ["REDOR"] NMR method for 
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directly measuring 13 C- 15 N internuclear distances in peptides, 
the most accurate current method for simply and inexpensively 
obtaining such distances. It is generally capable of 
accuracy to 0.1 A and a span of 8 A. In a specific 
5 embodiment, peptide binders are synthesized from amino acids 
labeled with 13 C and 1S N. Labeling is chosen to obtain the 
most useful distance data about the selected candidate 
pharmacophore structures. Either backbone nuclei , side chair, 
nuclei, or both can be labeled. The step is detailed below. 
10 Liquid NMR techniques can also be used to indirectly 

determine internuclear distances in peptides, but are less 
preferred since they require considerable data interpretation 
to obtain distances of less accuracy than those obtained by 
use of REDOR. 

15 Structure determination, step 5, determines a precise 

geometric conformation for both the candidate shared chemical 
structures, if possible, and the remainder of the binders. 
The preferred but not limiting method, consensus, 
, conf igurational bias, Monte Carlo {"CCBMC"] determination, 

20 step 5, is an efficient smart Monte Carlo method uniquely 
able to incorporate knowledge from prior steps to obtain 
highly accurate physical binder structures. Prom library 
screening, step 2, it is deduced that the binders have a 
shared, actual pharmacophore, structure because they all bind 

25 specifically to the same target molecule (hence, a 

" consensus " method) . It is not significant to the method if 
the binders come from more than one library as long as they 
all have a structure adaptable to representation in the 
consensus structure determination step (see infra) . From 

30 distance measurements, step 4, a few strategically chosen 
distances are accurately known. This information is 
heuristically utilized along with an accurate model of the 
physical atomic interactions and the allowed molecular 
conformations . 

35 Further, these means are particularly adapted for 

determining structures of molecules having limited 
conformational degrees of freedom at the temperature of 
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interest and conformational ly constrained by, e.g., internal 
bonds. Potential conformations are generated and selected by 
smart configuration bias, techniques which avoid generation of 
unnecessarily improbable new conformations. (Hence, a 
5 configuration bias" method.) The technique is preferably 
applied herein to conf ormationally constrained peptides. A 
concerted rotation technique is combined with conf igurational 
bias conformation generation so that new conformations 
automatically preserve the internally linked backbone 

10 structure constraints. This technique is preferably applied 
to the preferred constrained peptide library, of a sequence 
comprising CX 6 C (wherein X is any amino acid) . The technique 
is also applicable to other constrained peptide libraries, to 
peptoid libraries, and to any more general organic diversity 

15 libraries that meet certain geometric limitations (i.e., that 
have structures adaptable to representation in the consensus 
structure determination step (see infra) ) . 

The methods of the invention are not limited to the use 
of CCBMC for determining a consensus pharmacophore structure. 

20 Alternative embodiments of this invention may use alternative 
structure determination methods to determine a consensus 
pharmacophore structure. For example, a simple yet expensive 
method is to make exhaustive REDOR NMR measurements 
characterizing the candidate pharmacophore in each binder and 

25 then average these measurements- A somewhat less expensive 
method is to use a conventional Monte Carlo molecular 
structure determination method to limit somewhat the number 
of REDOR NMR measurements required to characterize the 
candidate pharmacophore. Conventional Monte Carlo methods, 

30 being unable to directly make use of partial distance 
measurements or consensus binding information, are less 
efficient than the CCBMC method and require more distance 
measurements* Further, other known techniques of molecular 
structure determination, for example folding rules or 

35 molecular dynamics, can be used in place of conventional 
Monte Carlo. 
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compounds 8 output from this invention and input to the 
process of drug testing and development. 

Although the preferred identity and ordering of the 
method steps is presented in Fig. 1, the invention is not 
5 limited to this identity and ordering. Other orderings, 
especially of steps 3, 4, and 5, are possible to achieve 
certain efficiencies. Steps can be inserted and deleted, for 
optimal effect. For example, an additional partial structure 
determination step can be inserted between existing steps 3 

10 and 4 to provide information on how best to make the step 4 
strategic measurements. As another example, in an 
alternative aspect, in lieu of screening one or more 
libraries to select binders, predetermined binders can be 
obtained and used (e.g., binders determined by any means to 

15 be specific to the same target molecule) ; thus, step 2 can be 
omitted. In another embodiment, step 4, the measurement 
step, can be omitted. While all method steps in the 
preferred embodiment assume an aqueous environment at body 
temperature (37 °c) , to the extent these parameters are 

20 relevant to the particular step, the invention is not limited 
to human environmental parameters. 

Screening against a diversity library consists of 
selecting by assay those library members which bind 
specifically to the target molecule of interest. Binding 

25 specificity is preferably a binding constant of less than 1 
/im (micromolar) , and more preferably less than 100 nm 
(nanomolar) . Preferably, an assay is done that detects an 
effect of binding of the binder to the target molecule on the 
target molecule's biological activity, to ensure that the 

30 binding is actually to the biological target of interest. 
Also, preferably, the selected binders are tested to further 
select those binders that bind to the target molecule 
competitively, to ensure that each binds to the same target 
in the target molecule. 

35 The output of the screening step is a number, N, of 

binders selected from one or more libraries for use by the 
subsequent steps of the method. The binders with highest 
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affinity are preferably selected for use by the subsequent 
steps. The chemical structure of each of the N binders 
selected for use is determined as part of the member 
synthesis and library screening. The primary chemical 
5 structure of the preferred constrained peptide library is 
specified by the amino acid sequence of the -X t - portion of 
the CX 6 C molecule . For more general organic diversity 
libraries, the selection and arrangement of library building 
blocks in the binders must be determined. 

10 It is a preferred aspect of this invention that the set 

of determined lead compounds is selective and small. Example 
1 illustrates that as pharmacophore distance tolerances are 
relaxed, the number of compounds retrieved by drug database 
searches increases geometrically. As this invention 

15 determines high resolution pharmacophore geometries, it can 
be expected that database searches, or other methods of 
determining leads from pharmacophore structure, will return 
only a few, selective, targeted leads. Methods limiting the 
number of leads decrease the cost of drug development and are 

20 consequently of considerable utility to the pharmaceutical 
industry and medical community. The expense of developing 
and evaluating lead compounds for biological effect and 
medicinal usefulness is well known. Each lead compound must 
be screened for pharmacological usefulness, efficacy, and 

25 safety. Often chemical modifications are required and the 
process must be repeated. Finally, the required in vivo 
pharmacologic toxicity and clinical trials alone can consume 
years of time and millions of dollars. 

Therefore, starting with a target molecule 1 having a 

30 biologically or pharmacologically interesting target, the 
method and apparatus of this invention determines a consensus 
pharmacophore structure. This consensus pharmacophore 
structure can then be used to determine a selective set of 
highly specific lead compounds 8 (Fig. 1) for rational design 

35 of drugs, e.g., capable of acting as ligand-mimics (agonists 
or antagonists) for the particular target molecule. 
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In the following discussion and examples, each cf these 
steps will be more fully described. 

5.1. SELECTION OF A TARGET MOLECULE 
5 The target molecule is any one or more molecules 

containing a target or putative target of interest. The 
target is a binding interaction region. The target can be in 
a single molecule or can be a product of a molecular complex. 
The target can be a continuous or discontinuous binding 

10 region. The target molecule selected for use {Fig. 1, step 
1) is preferably any molecule that is found in vivo 
(preferably in mammals, most preferably in humans) and that 
has biological activity, preferably involved or putAtively 
involved in the onset, progression, or manifestation of a 

15 disease or disorder. The target molecule can also be a 
fragment or derivative of such an in vivo molecule, or a 
chemical entity that contains the same target as the in vivo 
molecule. Examples of such molecules are well known in the 
art. Such molecules can be of mammalian, human, viral, 

20 bacterial, or fungal origin, or from a pathogen, to give just 
some examples. The target molecule is preferably a protein 
or protein complex. The target molecules that can be used 
include but are not limited to receptors, ligands for 
receptors, antibodies or portions thereof (e.g., Fab, Fab', 

25 F(ab') 2 , constant region), proteins or fragments thereof, 
nucleic acids, glycoproteins, polysaccharides, antigens, 
epitopes, cells and cellular components, subcellular 
particles, carbohydrates, enzymes, enzyme substrates, 
oncogenes (e.g., cellular, viral; oncogenes such as ras, raf, 

30 etc.), growth factors (e.g., epidermal growth factor, 

platelet -derived growth factor, fibroblast growth factor) , 
lectins, protein A, protein G, organic compounds, 
organometallic compounds, viruses, prions, viroids, lipids, 
fatty acids, lipopolysaccharides, peptides, cellular 

35 metabolites, steroids, vitamins, amino acids, sugars, 
lipoproteins, cytokines, lymphokines, hormones, T cell 
surface antigens (e.g., CD4, CD8, T cell antigen receptor). 
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In another specific embodiment, an integrin is used as a 
target molecule. Such molecules are known to function in 
clot formation, and can be used according to the present 
invention to develop lead molecules for drugs in the area of 
5 cardiovascular disorders. 

Target molecules for use can be obtained commercially 
(where the target is commercially available) , or can be 
synthesized or purified from natural or recombinant sources. 
In a specific embodiment, a target molecule is prepared that 

10 has been modified to incorporate an "affinity tag," i.e., a 
structure that specifically binds to a known binding partner, 
to facilitate recovery/isolation/immobilization of the target 
molecule. In a preferred aspect, recombinant expression 
methods well known in the art can be used to produce a 

15 protein target molecule as a fusion protein, incorporating a 
peptide affinity tag. Such affinity tags include but are not 
limited to epitopes of known antibodies (e.g., c-myc epitope 
(Evan et al . , 1985, Mol . Cell. Biol. 5:3610-3616)), a series 
(e.g., 5-7) of his residues (which bind to zinc) , maltose 

20 binding sequences such as pmal, etc. Tags are incorporated 
into protein targets at either the amino or carboxy- terminus . 
In another embodiment, the target is chemically attached to a 
tag (e.g, biotin (which binds to avidin, streptavidin) , 
streptavidin) , e.g., by biotinylation. 

25 The target molecule is purified by standard methods. 

For example, a protein target can be purified by standard 
methods including chromatography (e.g., ion exchange, 
affinity, and sizing column chromatography) , centrifugation, 
differential solubility, or by any other standard technique 

30 for the purification of proteins; in a preferred embodiment, 
reverse phase HPLC (high performance liquid chromatography) 
is employed. 

Once the target molecule has been purified, it is 
preferably tested to ensure that it retains its biological 
35 activity (and thus retains its native conformation). Any 

suitable in vitro or in vivo assay can be used. In instances 
where the desired target molecule is a fragment or derivative 
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of a molecule found in vivo, or is a chemical entity 
putatively containing the same target as a molecule found in 
vivo, it is highly preferred that testing be done of such 
desired target molecules prior to their use, so that among 
5 such desired target molecules, only those that have the same 
biological activity as the in vivo molecule or compete with a 
known ligand to the in vivo molecule, are selected for actual 
use as target molecules according to the invention. In the 
event that biological activity has been reduced or lost in a 

10 recombinant protein relative to the native form of the 
protein, the protein can be recombinant ly expressed in a 
different host (e.g., yeast, mammalian, or insect) and/or 
with a variety of tags and location of tags (on either the 
amino- or carboxy- terminal side), in order to attempt to 

15 achieve, or to optimize, recovery of biological activity. 

5.2. DIVERSITY LIBRARIES 
According to a preferred embodiment of the invention, 
\ div ersity libraries are screened to select binders , which 

20 specifically bind to the target molecule. Diversity 
libraries are those containing a plurality of different 
members. Generally, the greater the number of library 
members and the greater the probability that all possible 
members are represented, the more preferred the library. In 
25 preferred embodiments, the diversity libraries have at least 
10 4 members , and more preferably at least 10', 10 s , 10 10 , or 
10 14 , members. 

Many libraries suitable for use are known in the art and 
can be used. Alternatively, libraries can be constructed 
\ 30 using standard methods. Chemical (synthetic) libraries, 

recombinant expression libraries, or polysome -based libraries 
are exemplary types of libraries that can be used. 

In a preferred embodiment, the library screened is a 
constrained, or semirigid library (having some degree of 
35 structural rigidity) . Examples of constrained libraries are 
described below. A linear, or nonconstrained library, is 
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libraries," that contain oligonucleotide identifiers for each 
chemical polymer library member. 

In another embodiment, biological random peptide 
libraries are used to identify a binder which binds to a 
5 target molecule of choice. Many suitable biological random 
peptide libraries are known in the art and can be used or can 
be constructed and used to screen for a binder that binds to 
a target molecule, according to standard methods commonly 
known in the art . 
10 According to this approach, involving recombinant DNA 

techniques, peptides are expressed in biological systems as 
either soluble fusion proteins or viral capsid fusion 
proteins. 

In a specific embodiment, a phage display library, in 

15 which the protein of interest is expressed as a fusion 
protein on the surface of a bacteriophage, is used (see, 
e.g., Smith, 1985, Science 228:1315-1317). A number of 
peptide libraries according to this approach have used the 
M13 phage. Although the N-terminus of the viral capsid 

20 protein, protein III (PHI) , has been shown to be necessary 
for viral infection, the extreme N-terminus of the mature 
protein does tolerate alterations such as insertions. The 
protein PVIII is a major M13 viral capsid protein, which can 
also serve as a site for expressing peptides on the surface 

25 of M13 viral particles, in the construction of phage display 
libraries. Other phage such as lambda have been shown also 
to be able to display peptides or proteins on their surface 
and allow selection; these vectors may also be suitable for 
use in production of libraries (Sternberg and Hoess, 1995, 

30 Proc. Natl. Acad. Sci. USA 92:1609-1613). 

Various random peptide libraries, in which the diverse 
peptides are expressed as phage fusion proteins, are known in 
the art and can be used. Examples of such libraries are 
described below. 

35 Scott and Smith, 1990, Science 249:386-390 describe 

construction and expression of a library of hexapeptides on 
the surface of M13. The library was made by inserting a 33 
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base pair Bgl I digested oligonucleotide sequence into an Sfi 
I digested phage fd-tet, i.e., fUSE5 RF. The 33 base pair 
fragment contains a random or "degenerate" coding sequence 
(NNK) 6 where N represents G, A, T or C and K represents G or 
5 T. Cwirla et al., 1990 , Proc. Natl. Acad. Sci. USA 87: 6378- 
€382 also described a library of hexapeptides expressed as 
pill gene fusions of M13 fd phage. PCT publication WO 
91/19818 dated December 26 , 1991 by Dower and Cwirla 
describes a library of pentameric to octameric random amino 
10 acid sequences. 

Devlin et al., 1990, Science, 249:404-406, describes a 
peptide library of about 15 residues generated using an (NNS) 
coding scheme for oligonucleotide synthesis in which S is G 
or C. 

15 Christian and colleagues have described a phage display 

library, expressing decapeptides (Christian, R.B., et al . , 
1992, J. Mol. Biol. 227:711-718). The DNA of the library was 
constructed by use of an oligonucleotide comprising the 
degenerate codons [NN(G/T)] 10 (SEQ ID NO:8) with a self- 

20 complementary 3' terminus. This sequence forms a hairpin 
which creates a self -priming replication site that was used 
by T4 DNA polymerase to generate the complementary strand. 
The double- stranded DNA was cleaved at the Sfi I sites at the 
5' terminus and hairpin for cloning into the f USES vector 

25 described by Scott and Smith, supra. 

Lenstra, 1992, J. Immunol. Meth. 152:149-157 describes a 
library that was constructed by annealing oligonucleotides of 
about 17 or 23 degenerate bases with an 8 nucleotide long 
palindromic sequence at their 3' ends. This resulted in the 

30 expression of random hexa- or octa-peptides as fusion 
proteins with the 0-galactosidase protein in a bacterial 
expression vector. The DNA was then converted into a double - 
stranded form with Klenow DNA polymerase, blunt -end ligated 
into a vector, and then released as Hind III fragments. 

35 These fragments were then cloned into an expression vector at 
the sequence encoding the C- terminus of a truncated 
0-galactosidase to generate 10 7 recombinants. 
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Kay et al., 1993, Gene 128:59-65 describes a random 38 
amino acid peptide phage display library. 

PCT Publication No. WO 94/18318 dated August 18, 1994 
describes random peptide phage display "TSAR libraries" that 
5 can be used. 

Other biological peptide libraries which can be used 
include those described in U.S. Patent No. 5,270,170 dated 
December 14, 1993 and PCT Publication No. WO 91/19818 dated 
December 26, 1991. 

10 In a specific embodiment, a M peptide-on-plasmid" 

library, containing random peptides fused to a DNA binding 
protein that links the peptides to the plasmids encoding 
them, can be used (Cull et al., 1992, Proc. Natl. Acad. Sci . 
USA 89:1865-1869) . 

15 Another alternative to phage display or chemically 

synthesized libraries is a polysome -based library, which is 
based on the direct in vitro expression of the peptides of 
interest by an in vitro trahslation system (in some 
instances, coupled to an in vitro transcription system) . 

20 These methods rely on polysomes to translate the genomic 
information (in this case encoded by an mRNA molecule, in 
some instances made in vitro by transcription from synthetic 
DNA) (see, e.g., Korman et al., 1982, Proc. Natl. Acad. Sci. 
USA 79:1844-1848). Such in vitro translation-based libraries 

25 include but are not limited to those described in PCT 
Publication No. WO 91/05058 dated April 18, 1991; and 
Mattheakis et al., 1994, Proc. Natl. Acad. Sci. USA 
91:9022-9026. 

Diversity library screening, step 2 of Fig. 1, 

30 determines a few, N, members (compounds) from one or more 
libraries and their primary sequences all of which 
specifically bind to target molecule 1 in a similar manner. 
A structured organic diversity library is a prescription for 
the creation of a huge number of related molecules all built 

35 from combinations of a small number of chemical building 
blocks. Preferred diversity libraries for use according to 
the invention have members whose binding to a target molecule 
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is characterized by conf igurational entropy change that are 
relatively small to the binding energy. This means that 
library members have definite structures in the bound and, 
especially, the unbound states. A preferred example of a 
5 chemical diversity library for use in the invention contains 
short peptides with a constrained conformation. Short 
peptides without constrained conformations are often freely 
flexible in an aqueous environment and adopt no fixed unbound 
structure. The binding of such library members is 

10 complicated by significant conf igurational entropy changes. 
To eliminate this complication, it is preferred that all 
library members have a constrained structure and bind to the 
target molecule in a specific and identifiable manner. One 
method of achieving constrained conformation is to require 

15 internal linking, such as by disulfide bonds. 

In one embodiment , disulfide bond formation is achieved 
by use of libraries that contain peptides having a pair of 
invariant cysteine residues, preferably positioned in the 
range of 2-16 residues apart, most preferably 6-8 residues 

20 apart, that cross-link in an oxidizing environment to form 
cystines (disulfide bonds between cysteines) . An example of 
such libraries are those containing or expressing peptides of 
the form I^CXnCR 2 wherein R 1 is a sequence of 0-10 amino acids, 
C is cysteine, X n is a sequence of n variant amino acids 

25 (e.g., if all 20 classical amino acids are represented, X 
means any one of the 20 classical amino acids) ; n is an 
integer ranging from 2 to 16; and R 2 is a sequence of 0-10 
amino acids. R 1 and R 2 can contain invariant or variant amino 
acids. Another example is such libraries are those 

30 containing or expressing peptides of the form R 1 CX n R 2 , where 
R l , X, n, and R 2 are as described above; n is preferably 8 or 
9. A preferred constrained peptide library, of at least 10* 
members, consists of peptides comprising the sequence CX 6 C 
(SEQ ID N0:1) , wherein C is cysteine, X is any naturally 
35 occurring amino acid, and a disulfide bond is formed between 
the two cysteines. Additional invariant amino acids (e.g., 
preferably no more than 5-10 amino acids) on either the 
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include but are not limited to the D- isomers of the common 
amino acids, a-amino isobutyric acid, 4-aminobutyric acid, 
Abu, 2-amino butyric acid; 7- Abu, €-Ahx, 6-amino hexanoic 
acid; Aib, 2 -amino isobutyric acid; 3 -amino propionic acid; 
5 ornithine; norleucine; norvaline, hydroxyproline, sar cosine, 
citrulline, cysteic acid, t-butylglycine, t-butylalanine, 
phenylglycine, cyclohexylalanine, S-alanine, designer amino 
acids such as £-methyl amino acids, Ca-methyl amino acids, 
Na-methyl amino acids, fluoro- amino acids and amino acid 

10 analogs in general. Furthermore, the amino acid can be D 
(dextrorotary) or L (levorotary) . 

By way of example, the incorporation of non-standard or 
modified amino acids into libraries can be done by taking 
advantage of concurrent development in reassigning the 

15 genetic code (Noren et al., 1969, Science 244:182-188; 

Benner, 1994, Trend. BioTech. 12:158-163) and the charging of 
specific tRNAs with the desired amino-acid (Cornish et al., 
1994, Proc. Natl. Acad. Sci. USA 91:2910-2914). See also 
Ibba and Hennecke, 1994, Bio/Technology 12:678-682 

20 (particularly Table I), and references cited therein. These 
pre -charged tRNAs are then utilized in the in vitro 
translation system to incorporate the non-standard amino acid 
into the library of choice. The position of incorporation 
can be either random (variant) or defined (invariant) . The 

25 defined case can be chosen to maximize the utility of the 
resulting placement of the non-natural functional group to 
maximize either binding properties or the ability. to perform 
structural measurements. Similar techniques may be used to 
incorporate non-standard amino acids into the peptides. 

30 In a specific embodiment, an iterative approach to 

library construction can be taken, as structural information 
on the mode of binding to a given target is obtained. For 
example, information from structural analysis can be used to 
make libraries with library members containing chemical 

35 backbones that match known chemical scaffolds, enhance 

solubility or membrane permeability, reduce effect of water 
on structure, and incorporate other physical parameters 
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suggested by structural analysis. Use of algorithmically 
optimized library inserts can be used to increase the chances 
of finding binders of interest (see e.g., Arkin and Youvan, 
1992, Bio/Technology 10:297-300). 
5 In other embodiments, the following can be used to 

improve library use in both phage and bacterial systems: 
production of libraries in bacteria which overproduce the 
chaperonins GroES and GroEL (Soderlind et al . , 1993, 
Bio/Technology 11:503-507), and production in E. coli strains 

10 which prevent degradation in the periplasmic space (Strauch 
and Beckwith, 1988, Proc. Natl. Acad. Sci . USA 85:1576-1580; 
Lipinska et al . , 1989, J. Bacteriology 171:1574-1584). 
Purified cofactors such as GroES and GroEL could also be 
directly added to an in vitro expression and selection 

15 system. 

5.3. SCREENING OF DIVERSITY LIBRARIES 
Once a suitable diversity library has been constructed 
(or otherwise obtained) , the library is screened to identify 

20 binders having binding affinity for the target. Screening is 
done by contacting the diversity library members with the 
target molecule under conditions conducive to binding and 
then identifying the member (s) which bind to the target 
molecule. Screening the libraries can be accomplished by any 

25 of a variety of commonly known methods. See, e.g., the 
following references, which disclose screening of peptide 
libraries: Parmley and Smith, 1989, Adv. Exp. Med. Biol. 
251:215-218; Scott and Smith, 1990, Science 249:386-390; 
Powlkes et al., 1992; BioTechniques 13:422-427; Oldenburg et 

30 al., 1992, Proc. Natl. Acad. Sci. USA 89:5393-5397; Yu et 
al., 1994, Cell 76:933-945; Staudt et al., 1988, Science 
241:577-580; Bock et al . , 1992, Nature 355:564-566; Tuerk et 
al., 1992, Proc. Natl. Acad. Sci. USA 89:6988-6992; Ellington 
et al., 1992, Nature 355:850-852; U.S. Patent No. 5,096,815, 

35 U.S. Patent No. 5,223,409, and U.S. Patent No. 5,198,346, all 
to Ladner et al.; Rebar and Pabo, 1993, Science 263:671-673; 
and PCT Publication No. WO 94/18318. See also the references 
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cited in Section 5.2 hereinabove (disclosing libraries) 
regarding methods for screening* 

Screening can be carried out by contacting the library 
members with an immobilized target molecule and harvesting 
5 those library members that bind to the target. Examples of 
such screening methods, termed "panning" techniques are 
described by way of example in Parmley and Smith, 1988, Gene 
73:305-318; Fowlkes et al . , 1992, BioTechniques 13:422-427; 
PCT Publication No. WO 94/18318; and in references cited 

10 hereinabove. In panning methods that can be used to screen 
the libraries, the target molecule can be immobilized on 
plates, beads, such as magnetic beads, sepharose, etc., or on 
beads used in columns. In particular embodiments, the 
immobilized target molecule has incorporated an "affinity 

15 tag, " as described above, which can be used to effect 

immobilization by attaching the tag's binding partner to the 
desired solid phase. 

In one embodiment, the primary method of selecting from 
libraries is the use of solid phase plastic affinity capture 

20 to immobilize the target molecule prior to its use in the 
selection (screening) process. This method can be improved 
upon to increase throughput, selectivity and specificity. 
Solid phase plastic supports can be replaced with magnetic 
particles. In phage-based systems, large beads can be used, 

25 but these are not believed to be suitable, due to steric 
hindrance, for use in bacterial systems. This steric 
hindrance can be avoided by using high gradient magnetic cell 
separation with small particles («0.5jan) (Miltenyi et al., 
1990, Cytometry 11:231-238) . 

30 In a specific embodiment involving the use of a peptide 

phage display library, selection of a binder protein 
expressed on the surface of a bacteriophage thus selects both 
the binder protein and the DNA that encodes it (the DNA being 
within the phage particle) . Following binding between the 

35 target molecule and library members, phage are released from 
a solid support on which the binder-target molecule complex 
is immobilized, and are amplified, e.g., by infecting E. coli 
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plated with E. coli DHBorF' cells to determine the 
number of plaque forming units of the sample . In 
certain cases, the platings are done in the 
presence of XGal and IPTG for color discrimination 
5 of plagues (i.e., IacZ+ plaques are blue, lacZ- 

plaques are white) . The titer of the input samples 
is also determined for comparison. 

Alternatively, as yet another non-limiting example, 
screening a diversity library of phage expressing peptides 

10 can be achieved by panning using microtiter plates (see PCT 
Publication No. WO 94/18318) as follows: 

The target molecule is diluted and a small 
aliquot of target molecule solution is adsorbed 
onto wells of microtiter plates (e.g. by incubation 

15 overnight at 4°C) . An aliquot of BSA solution (1 

mg/ml, in 100 mM NaHC0 3 , pH 8.5) is added and the 
plate incubated at room temperature for 1 hr. The 
contents of the microtiter plate are flicked out 
and the wells washed carefully with PBS- 0.05% 

20 Tween® 20. The plates are repeatedly washed free 

of unbound target molecules. A small aliquot of 
phage solution is introduced into each well and the 
wells are incubated at room temperature for 2-24 
hrs. The contents of microtiter plates are flicked 

25 out and washed repeatedly. The plates are 

incubated with wash solution in each well for 20 
minutes at room temperature to allow bound phage 
with rapid dissociation constants to be released. 
The wells are then washed five more times to remove 

30 all unbound phage. 

To recover the phage bound to the wells, a pH 
change is used. An aliquot of 50 mM glycine-HCl 
(pH 2.0), 100 /xg/ml BSA solution is added to the 
washed wells to denature proteins and release bound 

35 phage. After 10 minutes at 65°C, the contents are 

then transferred into clean tubes, and a small 
aliquot of 1 M Tris-HCl (pH 7.5) or 1M NaH 2 PO< (pH 
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7) is added to neutralize the pH of the phage 
sample. The phage are then diluted, e.g., 10° to 
10** and aliguots plated with E. coli DHSorF' cells 
to determine the number of the plaque forming units 
5 of the sample. In certain cases, the platings are 

done in the presence of XGal and IPTG for color 
discrimination of plagues (i.e., IacZ+ plagues are 
blue, lacZ- plagues are white) . The titer of the 
input samples is also determined for comparison 
10 (dilutions are generally 10~* to 1(T*) . 

By way of another example, diversity libraries 
expressing peptides as a surface protein of either a particle 
or a host cell, e.g., phage or bacterial cell, can be 
screened by passing a solution of the library over a column 
15 of the target molecule immobilized to a solid matrix, such as 
sepharose, silica, etc., and recovering those particles or 
host cells that bind to the column after washing and elution. 

In yet another embodiment, screening a library can be 
performed by using a method comprising a first "enrichment" 
20 step and a second filter lift step as described in PCT 
Publication No. WO 94/18318. 

Several rounds of serial screening are preferably 
conducted. In a particularly preferred aspect, each round is 
varied slightly, e.g., by changing the solid phase on which 
25 immobilization occurs, or by changing the method of 

immobilization on [e.g. , by changing the linker to) the solid 
phase. When using a phage display library, the recovered 
cells are then preferably plated at a low density to yield 
isolated colonies for individual analysis. By way of 
30 example, the following is done: The individual colonies are 
selected, grown and used to inoculate LB culture medium 
containing ampicillin. After overnight culture at 37°c, the 
cultures are then spun down by centrifugation. Individual 
cell aliguots are then retested for binding to the target 
35 molecule attached to the beads. Binding to other beads, 

having attached thereto a non-relevant molecule, can be used 
as a negative control. 
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In a specific embodiment, different rounds of screening 
can respectively involve selection against targets in 
primarily their purified form, and then in their natural 
state (e.g., on the surface of a mammalian cell) [see, e.g., 
5 Marks et al., 1993, Bio/Technology 11:1145-1149, describing 
selection against cell surface blood group antigens) . 

In other examples, subsequent rounds of screening can 
involve immobilization of the target molecule by attachment 
at different ends (e.g., amino or carboxy- terminus) of the 

10 target molecule to a solid support, or presentation of 

library members by attachment to or fusion at different ends 
of the library members. 

By way of other examples of screening methods that can 
be used, genetic selection methods can be adapted for 

15 screening of libraries, or can be used in a recursive scheme. 
Thus, in a specific aspect, the invention provides screening 
methods in which methods allowing high throughput and 
diversity screening (e.g., screening phage display or 
polysome libraries against a ligand) are utilized in initial 

20 rounds, with subsequent rounds employing a genetic selection 
technique, in which the presence of a binder of appropriate 
specificity increases the activity of or activation of a 
transcriptional promoter or origin of replication. Genetic 
selection techniques that can be adapted for use (e.g., by 

25 inserting random oligonucleotides in the test plasmid) 
include the two-hybrid system for selecting interacting 
proteins in yeast, replicative based systems in mammalian 
cells, and others (see, e.g., Fields & Song, 1989, Nature 
340:246-246; Chien et al., 1991, Proc. Natl. Acad. Sci. USA 

30 68:9578-9582; Vasavada et al., 1991, Proc. Natl. Acad. Sci. 
USA 88:10686-10690). Thus, in a specific embodiment, 
compounds are produced as fusion proteins, and contacted with 
a different fusion protein comprising a target fused to 
another molecule, in which specific binding of the fusion 

35 proteins to each other results in an increase in activity or 
activation of a transcriptional promoter or an origin of 
replication. In a specific embodiment, a genetic selection 
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method is used in a later round of screening to either select 
directly for a library member that binds to a target 
molecule, or to select a library member that competitively 
inhibits binding of a ligand to the target molecule. 
5 Several exemplary methods for screening a phage/phagemid 

library are presented by way of example in Section 6.4 
hereinbelow. An exemplary method for screening a polysome - 
based library is presented in Section 6.3.3 hereinbelow. 

Once binders are selected from a diversity library which 

10 bind to a target molecule of interest, additional assays are 
preferably, although optionally, performed, including but not 
limited to those described below. Thus, in vivo or in vitro 
assays can be performed to test whether binding of a binder 
to the target molecule affects the target molecule's 

15 biological activity; binders that exert such an effect are 
preferred for use in subsequent steps of the invention. 
Alternatively, or in addition, competitive binding assays can 
be carried out to test whether the binder competes with other 
binders or with a natural ligand of the target molecule, for 

20 binding to the target molecule; binders that compete with 
each other, and that compete with the natural ligand, are 
preferably selected for use in subsequent steps of the 
invention. Alternatively, or in addition to the above 
assays, the binding affinity of binders for the target 

25 molecule is determined, by standard methods, or by way of 
example, as described in Section 6.5 infra. Binders of the 
highest affinity are preferred for use in subsequent steps of 
the invention. 

30 5.4. DETERMINING TEE SEQUENCE OR 

CHEMICAL FORMULA O P BINDERS 

Many of the references cited in Section 5.2 and 5.3 

hereinabove, which disclose library construction and/or 

screening, also disclose methods that can be used to 

35 determine the sequence or chemical formula of binders 

isolated from such libraries. By way of example, a nucleic 

acid which expresses a binder can be identified and recovered 
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from a peptide expression library or from a polysome -based 
library, and then sequenced to determine its nucleotide 
sequence and hence the deduced amino acid sequence that 
mediates binding. (In an instance wherein the sequence of an 
5 RNA is desired, cDNA is preferably made and sequenced.) 
Alternatively, the amino acid sequence of a binder can be 
determined by direct determination of the amino acid sequence 
of a peptide selected from a peptide library containing 
chemically synthesized peptides. In a less preferred aspect, 

10 direct amino acid sequencing of a binder selected from a 
peptide expression library can also be performed. 

Nucleotide sequence analysis can be carried out by any 
method known in the art, including but not limited to the 
method of Maxam and Gilbert (1980, Meth. Enzymol . 65:499- 

15 560), the Sanger dideoxy method (Sanger et al.. # 1977, Proc. 
Natl, Acad. Sci . U.S.A. 74:5463), the use of T7 DNA 
polymerase (Tabor and Richardson, U.S. Patent No. 4,795,699; 
Sequenase 1 *, U.S. Biochemical Corp,), or Taq polymerase, or 
use of an automated DNA sequenator (e.g., Applied Biosystems, 

20 Foster City, CA) . 

Direct determination of the chemical formulas of non- 
peptide or peptide binders can be carried out by methods well 
known in the art, including but not limited to mass 
spectrometry, NMR, infrared analysis, etc. 

25 In preferred aspects involving certain types of 

libraries well known in the art, sequencing or the use of 
known analytic techniques for chemical formula determination 
will not be necessary. In some such libraries, the identity 
and composition of each member of the library is uniquely 

30 specified by a label or "tag" which is physically associated 
with it and hence the compositions of those members that bind 
to a given target are specified directly (see, e.g., Ohlmeyer 
et al., 1993, Proc. Natl. Acad. Sci. USA 90:10922-10926; 
Brenner et al. # 1992, Proc. Natl- Acad- Sci. USA 

35 89:5381-5383; Lerner et al., PCT Publication No. 

WO 93/20242) . In other examples of such libraries, the 
library members are created by step wise synthesis protocols 
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accompanied by complex record keeping, complex mixtures are 
screened, and deconvolution methods are used to elucidate 
which individual members were in the sets that had binding 
activity, and hence which synthesis steps produced the 
5 members and the composition of individual members (see, e.g., 
Erb et al., 1994, Proc. Natl. Acad. Sci. USA 91:11422-11426). 

Step 2 of the invention provides as output N binding 
library members (binders) and their sequences or chemical 
formulas . 

10 

5.5. CANDIDATE PHARMACOPHORE SELECTION 
The prior diversity library screening, step 2, 
determines a set of size N of specifically binding members 
from one or more diversity libraries. While the binders are 

15 preferably but not necessarily isolated from one or more 

diversity libraries (e.g., binders need not be isolated from 
diversity libraries; known binders can be simply provided), 
the following description shall refer to the preferred 
embodiment wherein diversity library members are the binders. 

20 It will be apparent that the description is also readily 
applicable to binders that are not isolated from diversity 
libraries. 

The pharmacophore responsible for the library member 
binding is preferably determined by an overall select and 

25 test method in this and subsequent steps. In general, a 
pharmacophore is specified by the precise electronic 
properties on the surface of the binder that causes binding 
to the surface of the target molecule. In the preferred 
embodiment, these properties are specified by the underlying, 

30 causative, chemical structures. Chemical structures are 
specified generally by groups such as -CH 2 -, -COOH, and 
-CONH 2 . The preferred pharmacophore representation consists 
of a specification of the underlying chemical groups and 
their geometric relations. The more precisely the geometric 

35 relations are specified, the more preferred. In preferred 
but not limiting aspects, the geometric relations are precise 
to at least 0.50 A, and most preferably, at least 0.25 A. A 
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pharmacophore will usually comprise 2 to 4 of such groups, 
with 3 being typical. However, for complex protein 
recognition targets, a pharmacophore may comprise a greater 
number of groups. For example, it is possible that the 
5 entire 6 amino acid sequence, -X t -, may be needed for a member 
of the preferred CX 6 C library to bind to complex targets, in 
which case the pharmacophore includes the entire binder. 

Considering by way of example, the caBe of binders 
isolated from the preferred library, of sequence CX*C, the 

10 chemical groups defining a peptide pharmacophore are terminal 
groups on amino acid side chains. Typically, therefore, a 
sequence of two to four contiguous amino acids will contain 
the pharmacophore of interest. For example, Fig. 11 
illustrates an Arginine- Glycine -Aspartate sequence forming a 

15 well known platelet aggregation inhibiting pharmacophore, 
which is defined by the positions and orientations of the 
adjacent -CN 3 H«, -CaK 2 - , and -COOH groups. Pharmacophores 
formed by discontiguous amino acids are not likely to occur 
in the preferred library due to the conformational constraint 

20 on the short peptide imposed by the disulfide bridge. 

The selection step determines candidate amino acid 
sequences in each binder that define a candidate 
pharmacophore by the positions of their . terminal groups. 
Candidate selection depends substantially only on the 

25 chemical structures of the amino acid side chains and 
terminal groups (only very rarely on backbone groups) . 
Geometric structure is not yet available and cannot be used 
for candidate selection. In the preferred embodiment, amino 
acids are grouped into homologous groups defined by group 

30 members having similar side chain structure and activity (see 
infra) . Candidate pharmacophores are found by searching the 
sequences of the N binders for short sequences of homologous 
amino acids. This search will produce at least one 
candidate, because all the binders share the actual 

35 pharmacophore. Several candidates will usually be found 
since geometric information is ignored, and the search is 
thereby underdetermined. 
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Fig. 2A illustrates an exemplary method of performing 
the search for homologous sequences. Although this method is 
illustrated as searching for homologous contiguous sequences 
of length 3 , it is easily adaptable to search for homologies 
5 of other lengths and also for discontiguous homologous 

sequences. If no candidate pharmacophores of length 3 have a 
consistent consensus structure, then pharmacophores of length 
2, 4, or longer or discontiguous sequences must be searched 
and selected for test. For some complex targets, the 
10 pharmacophore may include the entire variable part of the 

library member. The exemplary method is a simple depth-first 
search for matching amino acid strings. More sophisticated 
string search methods are known and are equally applicable to 
this invention. 

15 The method begins with the administrative steps 201 and 

202 of labeling the binders with integers from 1 to N and 
assigning the string variable 'ABC to the next left most 
sequence of three amino acids to test in binder 1. If this 
is the first candidate selection, 'ABC will be at the left 

20 most position in binder 1. If prior candidates have been 
selected, 'ABC will be assigned one amino acid to the right 
of its prior assignment. The FOR loop, formed by steps 203, 
206, and 207, then selects each binder from 2 to N for 
scanning for a sequence homologous to 'ABC . Step 203 does 

25 loop administration. Step 206 does the scanning. If 

homologous sequences are found, test 207 loops back to scan 
the next binder. If homologous sequences have been found in 
all binders from 2 to N, the loop exits at step 204. In this 
case 'ABC is a string in binder 1 which is homologous to 

30 other strings in all remaining binders and is thus a 

candidate pharmacophore. The method exits at 205 for this 
candidate to be structured and tested for whether it is the 
actual pharmacophore. If a binder does not have a sequence 
homologous to 'ABC , then this string is not a candidate. In 

35 this case, test 208 determines if 'ABC is at the right end 
of binder 1. If so, there are no more homologies to test for 
and the method exits at 209. If not, then 'ABC is advanced 
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one amino acid to the right 210 and the scan of all binders 
is repeated beginning at 203. 

Fig. 2B illustrates how string variable 'ABC is scanned 
across binder 1, represented schematically by 220. First, 
5 'ABC is assigned to X 1 X 2 X 3 at 221, then to X 2 X 3 X 4 at 222, to 
X 3 X 4 X 5 at 223, and finally to X 4 X 5 X 6 at 224. 

Given an assignment to 'ABC , step 206 scans each other 
binder for example binder K with K>1, for homologous 
sequences. This is simply done by comparing all contiguous 

10 substrings of binder K with 'ABC to determine if they are 
homologous. They are homologous if corresponding amino acids 
in the substring and 'ABC are homologous. In turn, two 
amino acids are homologous if they satisfy established 
homology rules. Each homologous sequence found in binder K 

15 defines a separate candidate pharmacophore, if sequences 
homologous to 'ABC are found in all other binders. 

In a case where discontiguous homologous sequences are 
sought, 'ABC is assigned to amino acids in discontiguous 
positions in binder 1 and then compared for homologies to 

20 amino acids in the same relative positions throughout the 
other binders. 

Various rules of amino acid homology may be used in this 
invention. In the preferred embodiment, amino acids are 
homologous if they are found in the same class of amino 

25 acids, based on side chain activity (see Lehninger, 

Principles of Biochemistry, (1982), chap. 5). Preferred 
homologous groups of amino acids are as follows. The 
nonpolar (hydrophobic) amino acids include alanine, leucine, 
isoleucine, valine, proline, phenylalanine , tryptophan and 

30 methionine. The polar neutral amino acids include glycine, 
serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine. The positively charged (basic) amino acids 
include" arginine, lysine and histidine. The negatively 
charged (acidic) amino acids include aspartic acid and 

35 glutamic acid. The foregoing classes may be modified by 
those skilled in chemical arts to create finer 
classifications. For example, phenylalanine and tryptophan 
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The above -described step is the selection step of the 
overall select and test method. Distance measurements and 
Monte Carlo structuring, steps 4 and 5 f determine a consensus 
pharmacophore structure for the candidate, if possible. If a 
5 consensus is found, the candidate is the actual 

pharmacophore. If a consensus is not found, this selection 
step must be revisited, and a new candidate selected for 
test. 

10 5,6. INTRAMOLECULAR DISTANCE MEASUREMENTS 

Having obtained N binders, their chemical building block 
structures (chemical formula or primary sequence) , and the 
identification of a candidate pharmacophore in each binder, 
steps 4 and 5 of the method of this invention cooperatively 

15 determine a precise spatial structure for the candidate 
pharmacophore (if it exists; if not, a new candidate 
pharmacophore is selected.) In the preferred (but not 
limiting) embodiment of this invention, N members of the CX fi C 
library that specifically bind to the protein target of 

2 0 interest have been screened; their sequences determined; and 
a candidate pharmacophore consisting of homologous triplets 
(more generally from 2 to 6 mers) of amino acids has been 
determined in each binder. 

Step 4 measures one or more strategic distances, 

25 preferably no more than 10-20, e.g., 1-10 or, more 

preferably, 1-5 interatomic distances are measured. The 
remainder of the structure is determined in subsequent steps, 
other than by direct measurement. The interatomic distances 
measured in step 4 are preferably with an accuracy of at 

30 least 2 A, more preferably at least 1 A or 0.5 A or 0.25 A, 
and most preferably at least 0.05 A. Thus, in a preferred 
but not limiting embodiment, distances in the pharmacophore 
are specified to at .least approximately 0.25 A. Step 5, 
using the CCMBC computational method, then completes 

35 determination of the pharmacophore structure at a high 
resolution and the structures of the rest of the binder 
molecules with a secondary resolution. Having a high 
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resolution structure for the pharmacophore of interest is 
orders of magnitude more useful than having a low resolution 
structure for an entire binder. Consequently, steps 4 and 5 
focus resources on the former problem. 
5 A distance measurement method is preferred for use if it 

meets certain conditions, as follows. First, accuracy of 
distance measurements is preferably better than at least 0.25 
A for distances on the order of those between amino acids in 
a peptide. Second, measurement conditions preferably 

10 approximate target binding conditions, i.e., are 

approximately physiologic. For example, crystallization, 
which may induce conformational changes, is preferably 
avoided. Also, the employed measurement methods preferably 
allow one binder sample to be measured when dry, when 

15 hydrated and when bound to the target molecule of interest, 
thereby observing the effects of water and conformational 
changes on binding. Third, the measurement method is 
preferably quick and inexpensive. 

Important advantages are conveyed by these certain 

20 conditions. First, as the method of the invention determines 
high resolution pharmacophore structures, use of distances 
less accurate than the intended results would almost 
certainly result in decreased resolution- Second, as the 
CCMBC structure determination method approximates the 

25 structural effects of hydration and target binding, use of 
accurate distances including the physical effects of 
hydration or binding helps increase the resolution of the 
computational results. These distances as used in the CCMBC 
method pull the binder structures towards a more accurate 

30 representation both of the bound, hydrated pharmacophore and 
also of the remainder of the binder molecule without a 
computationally burdensome inclusion of water molecules and 
without knowledge of the target molecule's structure* 
REDOR NMR is the preferred method of distance 

35 determination. REDOR is a solid phase NMR technique which 
directly measures the inter-nuclear dipole-dipole interaction 
strength between two spin M nuclear species, denoted where 
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A and B are the two nuclear species measured. The inter- 
nuclear distance between A and B is simply determined from Dj^ 
by the following equation: 

D„ • (1. 



where is the inter-nuclear distance, h is Planck's 
constant, and y ht and y h are the respective gyromagnetic 
2Q ratios of nuclei A and B. REDOR is typically accurate to 
less than 0.05 A and can generally measure distances up to 
about 8 A. 

Any two nuclear species observable and resolvable by NMR 
methods and, preferably, adaptable to chemical inclusion in 

^5 the diversity library members of interest, may be the basis 
of REDOR measurements. Although the subsequent description 
is often directed to distance determinations between 13 C and 
15 N nuclei in members of a preferred library comprising the 
sequence CX 6 C, this invention is not so limited. One skilled 

2o ^ n the art can readily adapt the method for use in making 
measurements of other types of molecules (e.g., peptides and 
nonpeptides) ,- additionally, other nuclear species may be 
used. Other common spin M species that can be used include 
but are not limited to 3l P and the halogen l *F. 

25 General references on NMR techniques are Slichter, 

Principles of Magnetic Resonance. Berlin, Springer- Verlag, 
(1989) and Mehring, High Resolution N MR in Solids . Berlin, 
Springer -Verlag (1983) . REDOR references include Gullion et 
al., Rotational -echo double -res onance NMR , J. Magn. Res. 

30 81:196-200 (1989); Pan et al . , Determinat ion of C-N 

intemuclear distance bv rotational -echo double -reso nance NMR 
of solids. J. Magn. Res. 90:330-40 (1990); Garbow et al., 
BstsrmiTiation of the molecular conformation o f melanostatin 
using 13C. 15N-REDOR NMR spectroscopy. J. Am. Chem. Soc. 

35 115:238-44 (1993), all of which are incorporated herein by 
reference . 
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Other solid phase NMR techniques are applicable but less 
preferred. These include but are not limited to those 
disclosed in Kolbert et al . , Measurement of internuclear 
distances bv switched angle spinning . J. Physical Chemistry 
5 98:7936 et seq. (1994) , and in Raleigh et al . , Rotational 
Resonance NMR . Chemical Physics Letters 146:71 (1988). These 
techniques measure homonuclear distances only to 0.5 A 
accuracy and are less accurate than REDOR. Liquid phase NMR 
techniques of NOE (nuclear overhausser) and COESY 

10 (correlation enhanced spectroscopy) can also be used but are 
less preferred. They require complex interpretation to 
obtain comparable distance accuracy greater than 0.5 A in 
small molecules with complete rotational freedom. 

X-ray crystallography can also be used, although it is 

15 much less preferred, since crystallization may induce 

conformational changes in the binder, and since binding to 
the target molecule may be necessary for crystallization. 

In the case of REDOR measurements of the heteronuclear 
distances between "C and 15 N, 13 C and X *N are introduced 

20 ("labeled") at the .positions between which a distance 
measurement is needed. The preferred embodiment of the 
invention measures the l *N NMR resonance. Since nearly all 
the "N signal will originate with nuclear labels, very little 
background signal due to natural abundance nuclei need be 

25 accounted for. Alternatively, the "C resonance may be 

measured, in which case the natural abundance background is 
subtracted from the measurements. 

Since REDOR depends on observing the internuclear 
dipole-dipole interaction, the binder being measured should 

30 be substantially stationary on the time scale of the NMR 
signal. The measurement system preferably ensures this 
condition. The substrate holding the binder to be measured 
can be chosen so as to restrain binder motion, or the 
measured sample may be cooled to restrain motion, or, 

35 alt ernatively, the binder may be boun d to its target molecule 
in order to restrain its motion. — — ' 
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Further details of the REDOR distance measurements will 
make reference to Fig. 3. This illustrates the measurement 
method for one labeling of one binder, which is repeated if 
the binder requires multiple labelings and also is repeated 
5 for each binder. Subsequent description will focus on only 
one binder. 

Step 41 chooses a binder labeling. Labeling is 
preferably done to obtain the most information about the 
pharmacophore consistent with chemical labeling opportunities 

10 and available labeled amino acids. Backbone labeling, for 
example, labels the amide N of one amino acid and one of the 
backbone C's of a next adjacent or more distant amino acid. 
Backbone labeling is typically done in the backbone in the 
vicinity of the candidate pharmacophore. It might also be 

15 done away from a candidate pharmacophore to confirm a 
previously determined structure as described for step 6. 
Side chain labeling strategies vary with the chemical 
opportunities offered by the candidate pharmacophore. If a 
terminal N is available, an adjacent side chain or backbone C 

20 can be labeled. If not, the side chain C and backbone amino 
N can be labeled. Side chain labeling is preferably on side 
chains in the candidate pharmacophore. Preferred labeling in 
the candidate pharmacophore is either a backbone amino N and 
a nearby backbone C or a side chain C or, if available, a 

25 side chain amino N and an adjacent or nearby side chain C. 

In an alternative embodiment, to get the most structural 
information on the binders, these labelings are designed to 
select the actual major conformation from known possible 
conformations. For example, if it is known from preliminary 

30 determinations that a binder may exist in one of a few, e.g. 
two, major backbone or side chain folding patterns, the 
labelings are chosen to distinguish these conformations. 
Nuclear pairs labeled for measurement are preferably those 
that have significantly different distances in the possible 

35 conformations. 

Multiple labeling of one binder to determine multiple 
distances at once is possible, for example, by including one 
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"C and several a5 N nuclei, or vice versa, in one labeled 
molecule. Multiple labeling is limited, however, as is 
obvious to one skilled in the NMR arts, by chemical shifts of 
the various nuclear resonances. REDOR measurement of 
5 multiple 15 N- 13 C distances requires that each spectroscopically 
observed l5 N or 13 C resonance have a distinguishable chemical 
shift. If these conditions are not met, several separately 
labeled versions of the binder are prepared and measured, one 
for each internuclear distance sought. 

10 Step 42 synthesizes the labeled binder after a labeling 

has been determined by applying these preferences and rules . 
In an embodiment wherein the binder is a peptide, variously 
labeled 13 C or 15 N labeled amino acid reagents for the 
synthesis of the labeled binder are widely available from 

15 commercial sources. A preferred supplier is Isotec Inc. 
(Miamisburg, OH) . Other commercial sources include MSD 
Isotopes (Montreal, Canada) and Sigma Chemical Co. (St. 
Louis, MO) . Step 42 has three substeps: linear peptide 
synthesis 43, cyclization 44 (by forming the disulfide bond), 

20 and deprotection of the side groups 45. Synthesis and side 
chain deprotection are performed by solid phase peptide 
synthesis using standard Boc (tert-butoxycarbonyl) and Fmoc 
( 9- f luorenylmethyloxycarbonyl ) chemistry . Exemplary 
references for this method are Merrifield, J. Amer. Chem. 

25 Soc, vol 85, pp 2149 et seq. (1963); Caprino et al . , J. 
Amer. Chem, Soc. (1970); and Stewart et al.. Solid Phase 
peptide Synthesis, Berlin, Springer-Verlag (1984), which are 
herein incorporated by reference. Cyclization is by 
conventional mild oxidation, well known in the chemical arts. 

30 The method of these steps is detailed in Example 2 supra . 

To obtain accurate REDOR NMR measurements, the binder 
sample is preferably highly purified. Accordingly, it is 
preferable that the sample be at least 90% pure (but not 
necessary if spurious NMR signals can be discriminated) , and 

35 even more preferable that the sample be at least 95% pure. 
Such pure samples can be obtained as follows* In a first 
synthesis method, the binder peptide is synthesized directly 



- 52 - 



WO 96/30849 



PCTYUS96/04229 



on the substrate to be used in the subsequent NMR 
measurements. In this case particular care is preferably 
taken with the standard solid phase synthesis steps of 
Example 2. By way of example, synthesis reagents should be 
5 pure, adequate time should be allowed for diffusion of 
reagents and solvents throughout the interstices of the 
substrate resin, and between steps, prior reagents should be 
thoroughly washed from the resin before new reagents applied. 
That the purity, reaction time, and washings are adequate is 

10 gauged by subsequent analysis. An aliquot of the resulting 
peptide-resin is taken, the peptide is cleaved (Example 2) 
and its purity analyzed by mass spectroscopy or high 
performance liquid chromatography (HPLC) . 

In a second synthesis method, the peptide can be 

15 synthesized on any convenient solid phase substrate in a 
standard manner and then cleaved from the substrate. The 
peptide is purified by standard methods (e.g., HPLC) and then 
attached to the NMR measurement substrate. The attachment 
can be done by any methods known in the art, preferably at 

20 either the amino- or carboxy- terminus, e.g., by condensation 
of the free carboxy terminal group on the peptide with an 
amino labeled resin, with the attachment step preceding 
deprotection of any side chain carboxy groups on the peptide; 
by use of heterofunctional linker groups, etc. 

25 Great care is preferably exercised in forming the 

binder- substrate used for the REDOR NMR measurements. This 
invention is also directed to binder- substrates suitable to 
precise REDOR NMR measurements in the following environmental 
conditions: dry unbound, hydrated unbound, and bound to its 

30 molecular target molecule (e.g., in lyophilized or hydrated 
forms) . 

For any binder and any NMR measurement substrate 
utilized, the substrate should restrain the attached binder 
sufficiently so that binder motion will not average out the 
35 dipole-dipole interactions necessary for the REDOR 

measurement. Generally, this requires that the frequency of 
motion of the binder be less than the frequency of the 
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dipole-dipole interaction being observed, which varies with 
the nuclear species being observed and the measurement 
distance. For 13 C- l5 N observations to 2.5 A the binder motion 
frequency should be less than approximately 200 Hz; for 
5 observations to 5 A f less than approximately 30-50 Hz; and 
for observations beyond 5 A, less than approximately down to 
10 Hz. The more polar the substrate, such as glass beads or 
p-MethylBenzhydril amine ["mBHA"] resin, the more are polar 
attached binders (such as are many peptides) restrained. 

10 Less polar substrates, such as polystyrene resin, provide 
less restraints for polar binders. In an embodiment wherein 
a peptide comprising the sequence CX 6 C is bound to an mBHA 
resin with an glycine residue serving as a linker to a 
binding site on the resin, probably no additional steps need 

15 be taken for 2.5 A measurements. Additional steps that can 
be used, if needed, to slow binder motions include cooling 
the measurement sample to, for example, liquid N 2 temperatures 
(approximately 77 °K> or binding to a large, relatively 
immobile target molecule. 

20 Second, the net binder density is important and 

typically is adjusted. The substrate preferably has an 
adjustable number of binder synthesis sites or binding sites 
per unit of substrate surface area. Too high a binder 
density on the substrate surface will cause inter-molecular 

25 nuclear dipole-dipole interactions to distort the REDOR 
distance measurements. To obtain accurate intra-molecular 
distances, the peptides should be kept sufficiently far apart 
so that only intra -molecular nuclear dipole-dipole 
interactions are significant. Inter-molecular nuclear 

30 dipole-dipole interactions are preferably kept less than 

about 10% of the intra -molecular interaction. In the case of 
13 C- U N measurements, this criterium can be monitored by 
observing U C- 13 C dipolar couplings. As the dipole interaction 
falls off as R" 3 , keeping adjacent binders apart by more than 

35 approximately 2-3 times the distance to be measured is 

sufficient. For measurements to 5 A, this criterion can be 
satisfied by keeping binders approximately 10 A or more 
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approximately 50 A- Preferable substrate pore sizes for use 
with such moderate sized protein targets are no less than 
100-200 A. Excessive pore sizes can result in a too dilute 
binder that decreases NMR signal intensity. The preferable 
5 pore sizes also facilitate high purity peptide synthesis 
directly onto substrate resins by similarly facilitating 
diffusion of reagents and solvents to synthesis sites. Also, 
binder substrate binding is preferably of such a nature that 
it will not be disrupted under either dry conditions, aqueous 

10 conditions, and conditions suitable to binder- target binding . 
Generally, adequate pore sizes are in the range of 100-500 A, 
although this will vary with the size of the target molecule. 

Solid phase substrates that can be used include but are 
not limited to mBHA resins, divinylbenzyl polystyrene resins, 

15 and glass beads. All of these substances can be manufactured 
to have binding sites in the range from 0 to 1.0 mmol/g. In 
addition, these substrates can be made so as to have the 
following surface areas: for mBHA about 100 m 2 /g, for 
polystyrene from 50-100 m 2 /g, and for glass from 0.1-100 m 2 /g. 

20 These substrates also can be manufactured so as to have a 
surface binding site density in the range of from 0 to 1.0 
mmol/m 2 . More generally any microporous material with a 
surface density of binding sites adjustable from 0 to at 
least 1.0 mmol/m 2 , and preferably with pore sizes in the 

25 preferred ranges, can be used. Suppliers of such adjustable 
resins include Chiron Mimotope Peptide Systems (San Diego, 
CA) and Nova Biochem (San Diego, CA) . 

Peptide binders can be synthesized directly on the 
surface of the substrates, by way of example as set forth in 

30 Section 6.6 infra, to achieve a purity of preferably at least 
90%, more preferably at least 95%. In the case of a peptide 
comprising the sequence CX«C, the preferred peptide spacing on 
the substrate is no closer than approximately 10 A, or a 
peptide density of no greater than one peptide every 100 A 2 . 

35 Peptide synthesis on the preferred resin 

p-MethylBenzhydrilamine ["mBHA"] with 0.16 mmole/g of peptide 
binding sites, a surface of 100 m 2 /g, and a preferable pore 
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size of 100-200 A results in a binder- substrate having such a 
preferable peptide surface density and suitable for accurate 
REDOR NMR measurements in dry, hydrated, and bound 
conditions. The total binder density is more than tenfold 
5 above instrumental sensitivity. The glycine linker provides 
a sufficient spacer from the substrate surface. 

Steps 43 , 44, and 45 in the preferred embodiment of the 
invention are carried out by one of a number of commercial 
peptide synthesis sources, such as Chiron Mimotope Peptide 

10 Systems (San Diego, CA) and Nova BioChem (San Diego, CA) . 

Methods that can be used in these steps are known in the art. 
However, the preferred practice of these steps is detailed in 
the example in Section 6.6. 

The invention thus provides a method of performing solid 

15 state NMR, preferably REDOR NMR, measurements of molecules on 
a solid phase substrate. In one embodiment, the molecule is 
a compound having conformational degrees of freedom at the 
temperature of interest that are limited to torsional 
rotations about bonds between otherwise rigid subunits, the 

20 torsional rotations respecting any conformational 

constraints. The molecule is preferably a peptide, more 
preferably a peptide of constrained conformation, and is most 
preferably a peptide having one or more cystines (e.g., 
comprising the sequence CX 6 C) . In other embodiments, the 

25 molecule is a peptide analog or derivative. In a preferred 
embodiment, the substrate is a solid phase on which the 
molecule (e.g., peptide) has been synthesized, with a high 
degree of purity. In specific embodiments, the REDOR 
measurements of the molecule on the substrate can be done in 

30 a dry nitrogen atmosphere, under hydrated conditions, and 
when the molecule is either free or bound to a target. The 
invention is also directed to a solid phase substrate having 
a surface to which is attached a population of molecules 
(preferably peptides, peptide derivatives, or peptide 

35 analogs) , suitable for obtaining REDOR NMR measurements of 
the molecules. In specific embodiments, at least 90% of the 
population consists of a single molecule (i.e., 90V purity). 
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In a more preferred aspect, 95% purity is present. Methods 
of producing such solid phase substrates, as described above, 
are also provided. 

Step 46 REDOR spectroscopy is performed on the 
5 strategically labeled, binder peptide-resin sample. Step 46 
details include final sample preparation, spectrometer 
parameters and tuning, and excitation pulse sequence. Sample 
preparation can be carried out by standard methods. The 
binder peptide-substrate sample is dried in N 2 , and an 
10 approximately 0.1 g amount is sealed in the NMR measurement 
rotor. The rotor can be cooled, if necessary, to limit 
binder motion. 

An v alternative final sample preparation step is to bind 
the target molecule to the binder peptide-resin sample and 

15 then dry the complex in N 2 . Optionally, the binder peptide 
can be split from the resin before binding to the target. In 
this alternative, the highly accurate REDOR NMR distances are 
of the bound binder and thus reflect any conformational 
changes that occur upon binding with the target . 

20 A triple resonance, magic angle spinning ["MAS"] NMR 

machine is adaptable to REDOR measurements. Such machines 
are commercially available from Bruker (Billerica, MA) , 
Chemmagnetics (Fort Collins, CO), and Varian (Palo Alto, CA) . 
An exemplary machine suitable for use is in the laboratory of 

25 Prof. Zax, Cornell University (Ithaca, NY) . This machine 

includes a 7.05 Telsa magnet from Oxford Instruments (Oxford, 
United Kingdom) and RF pulse excitation and receiving 
hardware conventional in the NMR art. An exemplary 
measurement rotor is a triple resonance , MAS probe from 

30 Chemmagnetics. 

The exemplary magnetic field is adjusted for a *H Larmor 
frequency of 300 Mhz with, corresponding Larmor frequencies 
for U C and "N of 75.4 and 30.4 Mhz, respectively. An 
exemplary probe spin frequency (u r ) is 4,8 kHz, with 

35 corresponding rotor period (T r ) of 0.208 msec. 15 N resonances 
are measured. The low natural abundance of 15 N eliminates the 
need for natural background corrections. Alternatively, 13 C 
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measurements can be done with conventional background 
corrections . 

REDOR is a pulse NMR technique requiring careful 
excitation of appropriate *H, 13 C, and l5 N resonances 
5 synchronous with the MAS rotor and followed by observation of 
the 1& N free induction decay. Many alternative REDOR 
excitation sequences have been described in the literature, 
some of which are found in the references cited hereinabove. 
These sequences can involve multiple l3 C excitations per rotor 

10 period. The simple pulse sequence preferred for use in this 
invention requires only one n C excitation per period. 

The exemplary sequence for 8 rotor periods is 
illustrated in Fig. 4, and is detailed herein in a manner 
such that those skilled in the NMR arts can program an NMR 

15 spectrometer for similar measurement. Three channels excited 
are the *H channel 50, the "C channel 51, and 1& N channel 52. 
The 1J C and 15 N RF power supplies are tuned to the resonances 
of the nuclei whose distance is to be measured. The *H 
channel RF power is initially tuned to the resonance of a 

20 proton coupled to the 15 N of interest. The time sequence, 

(increasing to the right) of the exciting signals (increasing 
vertically) in each of these channels is illustrated* 

In the X5 N channel, an initial excitation is applied to 
the 15 N spins in either of two manners: either an initial 7r/2 

25 pulse may be applied or, as illustrated and preferred, a 
cross polarization transfer from the protons is made. 
Sufficient RF intensity is applied at time 54 in both the *H 
and 15 N channels, 50 and 51 respectively, to achieve a 
Hartman-Hahn precession match at a n spin flip time of 13.2 

30 /xsec. Subsequent to the initial 15 N excitation, synchronous tt 
pulses 56 are applied in phase with the MAS probe rotor for N c 
rotor cycles, denoted by line 59,. with sufficient RF 
intensity to achieve a it spin flip time of 13.2 fxsec. The 
phase of these it pulses is varied systematically to reduce 

35 artifacts in a manner well known in the NMR arts. The 
preferred sequencing is detailed in Table i. 
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Table 1 



X5 N 7T Pulse Phase Sequencing 


Number of rotor cycles 
between excitation and 
observation 


Phase sequence 
{in processing frame) 


2 


YY 


4 


XYXY 


8 


XYXYYXYX 



10 

The phase sequence is expressed as the axis, in the frame 
processing with the l5 N spins, about which the tt spin flip is 
made. This axis is systematically varied depending on the 
number of rotor periods intervening between the 15 N excitation 
15 and signal observation. The illustrated phase sequences may 
be varied into equivalent sequences in a conventional manner. 
For example, "XYXY" is equivalent to " -YX-YX" . Finally, at 
501 the free induction decay of the X5 N spins is observed and 
generates the time domain output signal . 
20 m the *H channel, the preferred sequence is an initial 

exciting tt/2 pulse 53 followed with the previously described 
cross polarization transfer 54 to the 15 N spins. The less 
preferred sequence omits these initial pulses in favor of a 
tt/2 1S N excitation. During the subsequent spin evolution time 
25 for N c rotor cycles and the free induction decay time 501, a 
decoupling field 55 is applied to the protons. The preferred 
decoupling field has a 66 kHz RF intensity to achieve a a H it 
spin flip in 7.6 psec. 

In the 13 C channel, two distinct options must be 
30 measured. The first option (not illustrated) has no 13 C 
exciting pulses. The second option (illustrated) has 
synchronous tt pulses 57 applied for N c rotor cycles at the 
rotor frequency but with a fixed phase delay 58, denoted by 
t lf and at sufficient signal intensity sufficient to achieve a 
35 ir spin flip time of 10.6 fisec. Any value of t x may be used; 
the preferred value is 1/2 the rotor period, T r /2. 
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Alternative REDOR pulse sequences include 2 or more 13 C pulses 
per rotor cycle. 

Summarizing still with reference to Fig. 4, a REDOR 
measurement scan is characterized by the number of rotor 
5 cycles, N c , of spin evolution. A complete scan comprises, 
first, an equilibration period, preceding the illustrated 
pulse sequences. Second, there is a 1S N excitation period 
comprising pulses 53 and 54. Third, there is a spin 
evolution period for N c rotor cycles which has two options, 

10 both measured. Both options comprise the application of 
decoupling *H field 55 and synchronous in phase "N m pulses 
56. The first option has no 13 C excitation; the second has 
synchronous phase displaced 13 C tt pulses 57. Fourth, and 
finally, there is observation of free induction decay 501 of 

15 the 15 N spins. Fig. 4 illustrates an N c of B. Each scan 
option is repeated, and the induction decay signal 
accumulated, for a sufficient number of times to obtain 
acceptable signal to noise ratio. With the preferred 
practice, this has required less than approximately 5,000 

20 scans, and typically 3000 have been sufficient. 

An alternative implementation of the REDOR measurement 
interchanges the roles of 13 C and 15 N and measures the free 
induction decay of 13 C. Further, the invention is not limited 
to this described pulse sequence and is adaptable to 

25 equivalent pulse sequences yielding direct inter-nuclear 
dipole-dipole interaction strengths. 

Following REDOR measurement step 46, is data analysis 
step 47. This comprises several substeps. As is 
conventional, the free induction decay signal is Fourier 

30 transformed from the time domain to the . frequency domain. 
The scan option without the 13 C excitation produces a 
transformed signal with an observed U N resonance peak of 
magnitude S; the scan option with 13 C excitation produces an 
observed 15 N resonance peak of magnitude S t . The REDOR output 

35 signal, denoted AS/S, is conventionally formed according to 
the equation: 
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A5 _ (5 - S f ) 



(2) 

5 S 



The output signal is observed for different K c . Preferably 0, 
5 2, 4, and 8 rotor cycles are observed. Other preferred N c 
will be apparent during the following description. 

Further analysis of the REDOR output signal, AS/S, is 
made clearer by a very brief explanation of how this output 
signal represents the spin 1/2 dipole-dipole interaction 

10 between the 13 C and 15 N. In the spin evolution period, the l K 
decoupling excitation eliminates all proton effects from the 
13 C and 1S N NMR spectra. Magic angle spinning, in the scan 
option without any 13 C excitation, eliminates all nuclear 
dipole-dipole and chemical shift anisotropy from the NMR 

15 line. Thus signal S represents an NMR resonance without any 
dipole interaction. However, in the second scan option, the 
n C 7T spin flip pulses reintroduce in a controlled manner the 
dipole-dipole interaction. This interaction causes 
additional dephasing, or loss of signal strength, in the 
observed * 5 N signal. Thus signal S, represents an NMR 
resonance with dipole interaction and the output signal AS/S 
represents the percentage strength of pure dipole-dipole 
interaction between the l3 C and "N nuclei. The exact loss of 
signal strength depends on the timing of the U C pulses and 

25 

the number of rotor cycles for which they are applied. 

In the alternative where a general phase delay, t l# is 
used, the expression for the REDOR signal is derived by 
numerically integrating the following equations from the Pan 
et al. reference (1990, J. Magnetic Resonance 90:330-340): 



30 

S f = J/cos[r r c*i>(a,P, t^JsinPdPdo 



2 2* 

(3) 



0 0 



3 5 where 
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G) p (a,p,t) = ±-| J D 0/ [sin 2 (P)cos2 (a+o> r t) - v/2sin2Pcos (a+G> r t] 

1 n »■ 



This integration can be done by standard numerical 
integration techniques such as are found in Press et al., 

10 Numerical recipe s: the art of scientific computing . 

Cambridge, U.K., Cambridge University Press, (1986), chapter 
4, which is herein incorporated by reference. Alternatively 
the expression can be directly evaluated from the symbolic 
representations by numerical tools such as Mathematica from 

15 Wolfram Research Inc. (Champaign, IL) or Mathcad from 

Mathsoft Inc. (Cambridge, MA). In a preferred embodiment, 
however, a much simpler approach is used. 

In the preferred embodiment, the 13 C pulse phase delay is 
1/2 the rotor period, T r , and the preceding equations can be 

20 simply expressed (Mueller et al . , 1995, J. Magnetic 
Resonance, in press) : 

±§ = i - u 0 (yG)X)>,+ 2 j^~i~[j k {m)v 

S £i 16* 2 -1 * (5) 

25 * " W« 



where J k is a Bessel function of the first kind. Adequate 
accuracy is obtained by limiting the summation of equation 5 

30 to its first five terms. Fig. 5 is a graph of this equation. 
Vertical axis €1 represents AS/S; horizontal axis 62 
represents A; and graph 63 represents equation 5. 

In detail, step 47 of Fig. 3 uses equation 5 and the 
REDOR output signal, AS/S, for various values of N c to obtain 

35 a best value for D^, the dipole interaction strength. The 
internuclear distance is simply and directly determined from 
Den *>y equation 1. An exemplary method for finding the best 
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value of Dq, is to use a least squares method. First, form 
the sum of the squares of the differences of the observed 
AS/S and AS/S computed from equation 5, which will be a 
function of Do,, T r/ and N c through X. Second, find the value 
5 Dq, minimizing this function by searching exhaustively in 
sufficiently small increments over the relevant range* For 
example, Dq, can be varied by varying R in 0.01 A increments 
from 0.5 to B A, More efficient minimization methods as 
presented in Press et al. chapter 10 can also be used. 

10 Values of the Bessel functions can be simply calculated by 
the methods in Press et al, supra, § 6.4. Alternatively, 
this minimization and best value determination is easily 
performed directly from the symbolic representations with the 
previously cited mathematical packages. 

15 The example in Section 6.6 provides typical results of 

this measurement and analysis method. 

This completes the method of Fig. 3 and determines the 
internuclear distance between the 13 C and 1S N nuclei to which 
the excitation channels were tuned for the REDOR NMR 

20 measurements. If other C-N pair distances are to be 

determined in the labeled binder, step 46 as detailed above 
is repeated for the other distinct resonances. If the 
alternative 15 N resonances cannot be distinguished, separately 
labeled binders are prepared and measured. 

25 

5.7- CONSENSUS, CONFIGURATIONS BIAS MONTE CARLO 
Broad overview 

With reference to Fig. 1, having found N specifically 
binding members of one or more libraries, step 2, selected a 

30 candidate pharmacophore shared by all these binders, step 3, 
and determined a few strategic distances in the vicinity of 
the candidate pharmacophore, step 4, precise pharmacophore 
and binder peptide structures are now determined by the 
preferred method, the consensus, conf igurational bias Monte 

35 Carlo method. Other orderings and identities of these steps 
are possible. For example, the binders may be predetermined 
thereby rendering step 2 unnecessary. Further, no strategic 
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distance measurement b may need to be made, and step 4 may be 
omitted. Alternatively, a partial structure determination 
step may be inserted before step 4 to guide selection of 
distances for measurement . 
5 Pharmacophore structure determination of this invention 

is not limited to the CCBMC method to be describee! . CCMBC 
makes the most efficient use of heuristic consensus binding 
and partial distance measurement information. However, the 
consensus pharmacophore can be determined by methods 

10 including but not limited to use of exhaustive REDOR NMR 

measurements or by extensive but fewer REDOR measurements in 
conjunction with a conventional molecular structure 
determination method, such as molecular dynamics, 
conventional Monte Carlo, or even peptide folding rules. 

15 In the following description, the CCBMC method is 

broadly overviewed; subsequently, details of important steps 
are described; and finally a description of the preferred 
computer method and apparatus for practicing the invention is 
given. From the description of the methods, equations, data 

20 structures, and programs provided herein, one will be able 
readily to translate them into implementations. 

Although the following descriptions are directed to 
binders isolated from the preferred library of peptides 
comprising the sequence CX 6 C (constrained by disulfide bonds) , 

25 the method is applicable to more general organic diversity 
library members. It is immediately applicable to compounds 
from constrained peptide libraries with other scaffolds and 
also to compounds from similar peptoid libraries. It will be 
readily apparent that the method is applicable to any 

30 compounds whose structural region of interest exhibits 
conformational degrees of freedom at a temperature of 
interest {e.g., body temperature 37°C) that are limited to 
torsional rotations of rigid molecular subunits about bonds 
between the subunits, in which any loops present in the 

35 structural region of interest are independently rotatable by 
concerted rotation (see Section 7. Appendix: Concerted 
Rotation) . Examples of such compounds include but are not 
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limited to peptides, peptoids, peptide derivatives, peptide 
analogs, etc., including members of libraries discussed in 
Section 5.2, supra. 

General features of Monte Carlo simulation methods are 
5 known. A reference is Rowley, Statistical mechanics for 

thermophvsical property calculations . Englewood Cliffs, N.J., 
PTR Prentice Hall (1994), especially chapters 5 and 7, which 
is herein incorporated by reference. The application of 
simple Monte Carlo to constrained peptides has conventionally 

10 been hindered by difficulty generating geometrically proper 
and energetically useful conformational alterations, and by 
the consequent wasteful and inefficient exploration of 
conformational space. This method overcomes these problems 
for constrained peptides with a novel combination of 

15 techniques. In addition, this method is uniquely able to 
incorporate partial information about binding affinities and 
distance measurements to improve determination of the 
pharmacophore structure, one goal of the invention. 

Fig. 8 is a overview of the method. Step 91 represents 

20 the initial geometric and chemical structure of each binding 
peptide in computer memory. Peptide geometric structure is 
represented as a set of records, each record representing one 
rigid subunit or one atom of the peptide. The subunit 
records are linked together as the subunits are linked in the 

25 peptide molecule. Each rigid unit record includes fields for 
the composition, structure, and connectivity of the rigid 
unit represented. Since the rigid units only undergo 
torsional rotations about mutual bonds, their internal 
geometric structure is fixed. 

30 If a previous run with these peptides has been done, 

peptide initial structure may be chosen as one of the 
structures generated late in that run. Such an initial 
structure is desirable since the effects of arbitrary initial 
conditions have been eliminated. Alternatively, an initial 

35 structure is generated from a prototypical backbone without 
side chains by adding sidechains with random torsional 
orientations. For members of each type of diversity library, 
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a prototypical backbone meeting structural constraints and 
representing an allowed configuration for a member possessing 
no side chains can be defined. The prototypical backbone for 
the CX 6 C library is generated from the CCBMC model itself as 
5 run for the linear peptide C{gly) 6 C (SEQ ID NO:7) using a 
Hamiltonian consisting only on the term- The term 
contains only terms which, in the disulfide bond backbone 
region -C l -S l -S 2 -C 2 - , limit the S^Sj distance to 2*038 A and 
both the Cj-S 2 and the S x -C 2 distances to 2.883 A. When run 

10 for a linear peptide, no Type II backbone moves are made. 
Only Type I backbone moves which remove and regrow randomly 
selected portions of the backbone are used to generate 
backbone alterations. The model is run with temperatures 
gradually decreasing from room temperature to a small 

15 temperature, approximately 1 °K. The final low temperature 
structure is used for the prototyptical backbone. Backbones 
for similar constrained peptide libraries can be constructed 
in similar manners. 

In memory, for each peptide, a current structure is 

20 represented; the initial current structures being the just 
assigned initial structures. Also in memory is represented a 
proposed modified structure for one peptide. At step 92 the 
processor generates "moves" that transform the current 
structure of a randomly chosen peptide into a proposed 

25 modified structure. The moves mimic body temperature (37 °C) 
thermal agitation experienced by the binders so that their 
equilibrium structure may be determined. 

Generation of these moves for conf ormationally 
constrained peptides is an important aspect of this method. 

30 There are two move types. Type I moves alter the 

conformation of the side chain of a randomly chosen amino 
acid of the randomly chosen peptide. The alteration is built 
by side chain removal followed by side chain regrowth into a 
new torsional conformation. During regrowth, unfavorable 

35 overlap with neighboring side chains is -avoided. Type II 
moves alter the conformation of a limited random region of 
the peptide backbone of a randomly chosen binder by 
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performing linked, or "concerted", rotations, the linking 
being such that only four backbone rigid units are spatially 
displaced. Thereby the internally bonded ring of 8 amino 
acids will not be disrupted. A reference describing a 
5 similar move in linear alkane molecules is Dodd et al . , & 
concerted rotation algorithm for atomistic Monte Carlo 
simulation of polvmer melts and glasses . Molecular Phys., vol 
78, pp 961 et seq. (1991), which is herein incorporated by 
reference. The ratio between the Types I and II moves is an 

10 adjustable parameter with a preferred value of 4. 

Another important aspect of this method is that both 
moves are selected in a M conf igurationally biased" manner. 
Normal Monte Carlo methods use standard Metropolis 
procedures, in which each proposed structure is generated 

15 randomly and independently of the current structure with an 
equal a priori probability. However, for complex molecules, 
it is known that this typically results in the generation of 
many highly improbable or energetically unlikely structures. 
In some situations up to, 10* wasted moves are generated for 

20 each useful move, a very considerable waste of processor 
resources. In contrast, the method of this invention 
generates proposed structures according to an a priori 
probability depending on the current structure and the 
energetic cost of the new structure. This bias toward more 

25 acceptable structures of lower energy avoids generating 
highly improbable structures, making a very much more 
efficient use of processor resources. Because detailed 
balance must be satisfied, the acceptance probability of the 
conf igurationally biased method must include factors in 

30 addition to the usual Boltzman factor. A reference applying 
a similar method for simple linear alkanes is Smit et al.. 
Computer simulations of the energetics and siting of n- 
alkanes in zeolites. J. Phys. Chem. vol 98, pp 8442 et seq. 
(1994), which is herein incorporated by reference. 

35 At Step 93 the processor evaluates the energy, or 

Kamiltonian, of the proposed configuration. The Hamiltonian 
contains two groups of terms: conventional physical energy 
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the N binders together; no physical intermolecular effects 
are considered. The binders are otherwise treated 
independently by the method. 

The measurement constraint term, H^, is added to 
5 represent the distance measurements made, which are in fact 
actual distances in the molecules and constrain any simulated 
structure. This term makes energetically unfavorable, by 
adding pseudo chemical bonds of the measured lengths, moves 
that cause the constrained internuclear distance to depart 

10 from their measured values. Of course if no partial distance 
measurements have been made or are otherwise available, this 
term may simply be omitted from the Hamiltonian without 
adversely affecting the practice of this step. Which 
measurements to make, if any, is guided by the results of the 

15 consensus structure determined. If an adequate structure can 
be obtained without assistance of distance measurements, none 
need be incorporated. If inadequate results are obtained, 
additional iterations of the method will need distance 
measurement inputs . 

20 Step 94 tests the proposed structure against an 

acceptance probability, accept (curr->prop) . This acceptance 
probability is determined by the energy of the proposed 
structure previously computed in step 93 . If the proposed 
structure fails this test and is not accepted, the method 

25 progresses immediately to step 96. If the proposed structure 
meets the test and is accepted, the accepted proposed 
structure replaces and becomes the current structure. The 
proposed structure of this peptide is also saved (given 
certain other conditions detailed later) in a separate memory 

30 store of structures for later analysis. This structure store 
is preferably on disk. 

Repeated application of the concerted rotation may lead 
to a slightly imperfect structure, due to numerical precision 
errors. In an alternative embodiment, peptide geometry would 

35 be restored to an ideal state by application of the Random 
Tweek algorithm after several thousand moves (Shenkin et al., 
1987, Biopolymers 26:2053-85). 
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Step 96 tests whether enough structures of equilibrated 
total energy have been generated in this simulation run. The 
run terminates if a sufficient number have been generated. 
Sufficiency is determined on the basis of whether the 
5 statistical sampling errors of the average pharmacophore 
structure determined at step 97 is adequate (typically, less 
than 0.25 A). Preferably, 25,000 equilibrated structures 
would be accumulated for each run. Also, preferably, three 
runs would be performed for a total of 75,000 saved 

10 structures. 

Fig* 9 illustrates energy equilibration of an actual 
run. Axis 101 is the total energy of a set of peptide 
binders; axis 102 is the number of moves accepted. Traces 103 
represent total energies of all binders from each of the 

15 three runs. Typically, run energy rapidly equilibrates 
within less than approximately 2000 moves in most cases. 
Subsequent saved structures are counted toward termination. 
Traces 103 display typical energy variations superimposed on 
a secular stability. The illustrated energy variations 

20 typically comprise several components having different 
variabilities. First, there is a very high frequency 
oscillation with a period of a few tens of moves (known as 
"hair") . Second, there is a low frequency oscillation with a 
period of several hundred to a few thousand moves and with 

25 low amplitude. 

Step 97 analyzes the structure stored in memory. In the 
simplest preferred embodiment, the stored geometric 
structures for each binder are simply averaged, yielding a 
final structure for each binder and for the candidate 

30 pharmacophore. In another alternative , clustering software 
seeks clusters of similar structures for each binder. The 
clusters are then averaged to give a final structure for each 
variant structure for each binder. The variants represent 
alternative foldings for the binder. Exemplary clustering 

35 methods are found in Gordon et al . Fuzzv cluster analysis pf 
molecular dynamics trajectories. Proteins: Structure, 
Function and Genetics 14:249-264 (1992). 
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Alternative post -processing can be done on the clustered 
structures to account for small bond angle vibrations . Such 
vibrations are expected to make small perturbations to the 
clustered structures determined by the Monte Carlo method and 
5 can be accounted for by a brief molecular dynamics 
simulation. Such a simulation is fully defined by the 
Hamiltonian, comprising the physical and heuristic energies 
to be described infra in Eqn. 8, and by the temperature of 
interest. The structures observed during the simulation are 

10 averaged to determine a final more accurate equilibrium 

structure. A code capable of performing such a simulation is 
Discover® from BIOSYM (San Diego, CA) . Preferably, the 
molecular dynamics simulation would be run for approximately 
10 s bond angle vibration periods. Since the typical bond 

15 angle vibration period is 1CT 2 ps (1 ps = 10" 12 sec), such a 
run will encompass approximately 1 ns of molecular time. 

Confiaurational bias move generation details 

One Type I or II move will, in general, alter the 

20 position of several rigid units on a side chain or along the 
backbone. Each altered rigid unit is sequentially considered 
during move generation. The Hamiltonian describing the 
energy of the rigid unit currently being considered in a move 
is divided into an internal, u int , and an external, u**S part, 

25 where u ext is all energy not included in . In the preferred 
embodiment, u int is set to 0; an alternative choice would be 
to include only the torsional interaction energy between this 
rigid unit and units to which it is currently bound, u*** 
generates a probability distribution, p int f according to which 

30 is generated a set, # k , k « 1...K, of candidate torsional 

angles for the bond between the rigid unit being examined and 
rigid units already examined, u* 3 * generates another 
probability distribution, p***, according to which is selected 
one torsional angle from the prior set as the proposed new 

35 angle for the rigid unit being examined. These probabilities 
are defined by the equations: 
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Pi ~ZT Z ( 6 ) 



K 

r ext_\-* „ fixe i i » 



10 



15 



In this equation, V signifies the rigid unit being 
considered, K is the total number of candidate torsional 
angles generated by p int , and 0 = 1/kT (k is Boltzman's 
constant; T the temperature, preferably 37 °C) . The overall 
probability of generating a transition from the current to 
the proposed structures and accepting the proposed structure 
are given by the equations: 



20 



P(curr-prop) ~j] pf nt <<J> i( *)pr £ 

W neu s jj w *xt (7) 



accept I curr-prop) =min (1 , ) 



25 



In this equation, M is the total number. of rigid units added 
in the move. W° ld is a weight for the reverse move and will 
be described subsequently. 

Because energy is included in the generation 
probabilities, proposed structures are preferentially of 
lower energy. Since the acceptance of proposed structures 
depends on their energies, the acceptance of proposed 
structures is thereby more probable. 
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peptide memory re presentation details 

It is well known that at body temperature peptides 
consist of linked rigid units capable only of torsional 
rotational about mutual bonds whose lengths and angles are 
5 fixed. The torsional rotations respect any molecular 

conformational constraints. See Cantor et al . , Biophysical 
chemistry part I the conformation of biological 
macromolecules . New York, W.H. Freeman and Co. (1980), which 
is herein incorporated by reference. Table 2 lists the rigid 
10 units encountered in the preferred embodiment of this 

invention utilizing libraries of conf ormationally constrained 
peptides. Table 2, where applicable, also lists dihedral 
bond angles between incoming and outgoing bonds to a rigid 
unit and the assigned unit type. 

15 

Table 2 



20 



25 



30 



1 Type 


Chemical 
Structure 


Bond angle 
(if applicable) 




Backbone and side chain 
rigid units 


A 


-NH 2 




B 


j 

-CttH- 


70.5° 


C 


-CONH- 


70.5° 


D 


-COOH 










Side chain only rigid units 


E 


-CH^ 


70.5° 


F 


1 

-CH- 


70.5° 


G 


-S- 


70.5° 






0° 


: 


-CH 3 




J 


-OH 




K 


-SH 
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5 



Type 


Chemical 
Structure 


Bond angle 
(If applicable) 


L 


-NH 2 




M 






N 


-CONH. 




0 


-CN,H« 




p 


-C,N 2 H 3 




Q 


-C,NH t 





Table 3 illustrates the decomposition of all amino acid side 
chains into rigid units. Glycine is a special case, without 
a side chain. Proline is a special case with a side chain 
cyclically bonded to the backbone amino N. 



20 



25 



30 



35 
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Table 3 



5 



10 



15 



25 



Amino Acid 


Rigid Units 


Glycine 


-CcrH 2 - (SPECIAL CASE) 


Alanine 


-CH 3 


Arginine 


- CH 3 - CH 2 - CH 2 - CN 3 H< 


Aspartate 


-CH 2 -C00H 


Asparagine 


-CH 2 -CONH 2 


Cysteine 


-CH 2 -SH 


Glutamate 


-CH.-CH 2 -COOH 


Histidine 


-CH 2 -C 3 N 2 H 3 


Isoleucine 


-CHt-CHj-CH.-CH, 


Leucine 


-CH 2 -CH(-CH 3 ) 2 


Lysine 


-CH 2 -CH 2 -CH 2 -CH 2 -NH 2 


Methionine 


-CH 2 -CH 2 -S-CH 3 


Phenylalanine 


-CH 2 -C 6 H 5 


Serine 


-CH 2 -OH 


Threonine 


-CH<-CH 3 )-OH 


1 Tryptophan 


-CH 2 -C e NH 6 


1 Valine 


-CH(-CH 3 )-CH 3 ! 


| Tyrosine 


-CH 2 -C € H 4 -OH | 



Fig. 10 illustrates a structurally correct but 
geometrically inaccurate decomposition of the peptide 
backbone CX 6 C into rigid units (inessential hydrogens have 
been omitted) . Rigid units are set off in boxes 121 and 
their types 122 are indicated. Fig 11 illustrates a 
structurally correct but geometrically inaccurate 
decomposition of the peptide backbone and side chains of 
-arginine-glycine-aspartate- ("RGD") into rigid units. Rigid 
units are set off in boxes 131 and their types 132 are 
indicated. 
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Rigid units are represented as records in memory. The 
data structure for a peptide comprises records for its 
constituent rigid units linked together by data pointers 
exactly as the actual rigid units in the peptide are 
5 chemically linked. The record representing a rigid unit 
comprises fields for: type of the unit, pointers to 
chemically bonded units, all atoms of the unit and their 
spatial positions, atoms of the unit that are the target of 
the incoming and outgoing bonds, amino acid to which the unit 

10 belongs, and atomic composition of the unit. 

A known, conventional representation of atoms and atomic 
interactions is taught by the AMBER references. Each atom is 
divided into a series of subtypes of specific properties. 
For example, for carbon there are subtypes C, C2, CA, CT, 

15 etc.; for nitrogen, there are N, N2, etc.; for oxygen, there 
are O, 02, etc.; and for hydrogen, there are H, H2, etc. 
Bonds between each pair of subtypes are separately 
characterized by equilibrium lengths, angles, and torsional 
energies. Interactions between each pair of subtype atoms 

20 are separately characterized by Lenard- Jones force 
parameters, hydrogen bonding force parameters, and 
electrostatic charges. Amino acid charge distributions are 
in Weiner et al . , J. of Computational Chem., 7:230-52 
(1986) . 

25 Thus each atom in each rigid unit is represented by an 

in-memory record comprising fields for: its AMBER reference 
subtype and any electrostatic charge. The atom's spatial 
position relative to its containing rigid unit, stored in 
that unit's record, is geometrically determined from the 

30 unit's internal chemical structure and bonds by the AMBER 

bond lengths and angles defined for each of these bonds. The 
relative spatial positions of atoms within a rigid unit are, 
of course, fixed, and there is no interaction energy to 
consider between atoms within a rigid unit. 

35 Fig. 11 is a complete memory representation of a 

tripeptide sequence -RGD- (a known pharmacophore) . Rigid 
units are set off in boxes 131 and their types 132 are 
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indicated. The torsional degrees of freedom between the 
rigid units are indicated by angle arrows 133. AMBER atoms 
types are indicated as at 134 . Net atomic charges are 
indicated only for arginine as at 135. Rigid unit records 
5 are linked into a data structure modeling the rigid unit's 
physical linkages. Not shown are relative atomic spatial 
positions represented by the atoms rectangular coordinates. 

All parameters defining the AMBER atomic representations 
and interatomic forces can be found in Weiner et al., J. of 

10 Computational Chem., 7:230-52 (1986), and Weiner et al . f J. 
Amer. Chem. Soc, 106:765 (1984). Conventionally, these 
parameters are obtained from computer readable files from 
commercial sources. The preferred computer readable source 
of these parameters is from Insight II® 2.3.5 software from 

15 BIOSYM (San Diego, CA) . Other sources are Tripos (St. Louis, 
MO) and CHARMm (Molecular Simulations, Inc., Burlington, MA). 

Interaction energy evaluation details 

The form of the intramolecular energy, or Hamiltonian, 
20 evaluated at step 93, is an important element of this 
invention. The Hamiltonian consists of the components: 

H zotal ~ H l> total 

libinders (g) 



coca J = ^J. molecular* ^1, AMR^J, consensus 



25 



The H 1#li oucux*r component is determined from the Weiner et al. 
references, J. of Computational Chem., 7:230-52 (1986) , and 
J. Amer. Chem. Soc, 106:765 (1984). 

30 



"l.*olacula:= E ^ (COS ^l.i^ +D + £ 



RiJii Ri.ij 



rigid unit 

torsional aeon pair* ( 9 ) 

angles 



* R i.v 



a torn pairs H-bond pairs 



B-tonapi 



D 12 plO 
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Here, 4> lti is the i'th torsional angle between rigid units of 



between the i'th and j'th atoms in different rigid units of 
the I'th binder. The first term in this equation is the 
5 torsional energy of rigid units; the second is the 

interatomic Lenard- Jones energy; the third is the interatomic 
electrostatic energy; and the fourth is the interatomic 
hydrogen bond energy. Rigid unit torsional rotations 
directly change the first term. Such rotations indirectly 

10 change all other terms as interatomic distances change. 

The AMBER parameters V in , A ij# B 0 , q it C aj and D i;j are 
obtained as stated above. The effect of water is 
approximated in a known manner by setting c equal to 4£ 0 r, 
where r is distance (in A) in the electrostatic term and e c is 

15 the vacuum permeability. 

The distance constraint term, as described, makes 
energetically unfavorable moves which cause those measured 
interatomic separations in the simulation to depart from 
their measured values. If no measured values are available, 

20 this term is simply omitted from the Hamiltonian. Since this 
is not a physical energy and in simulation equilibrium the 
binders should have the measured distance, it is advantageous 
that this term should make only a small contribution to the 
equilibrium energy, no more than 10% of the total energy and 

25 preferably approximately 2.5 to 5%. Further, it is 

advantageous that the energetic disfavor be weighted by the 
confidence in the measurements , so that measurements having 
more confidence have a greater effect. 

Many forms of this energy meet these criteria. The 

30 preferred form is: 



the I'th binder peptide, and R : 



is the interatomic distance 





(10) 



abMTvmd 
dim tone* pairs 



35 
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where R <0, i. i3 is a measured distance in the l'th binder peptide 
between atomic pair i j . This makes the constraints appear as 
an elastic pseudo-bond with equilibrium length as measured. 
The w x £j are weights designed to meet the above size criteria. 
5 In the preferred embodiment , they are calculated with an 
overall multiplicative factor limiting the contribution of 
H^jhr to no more than approximately 5% of the total 
equilibrated energy. Their relative value is selected to 
reflect the lower reliability of longer measurements. Thus 

10 if R to Vij is between 0 and 3 A, w 1#ij has a relative value of 1; 
if the measurement is between 3 and 4.5 A, the relative value 
is 2; if between 4.5 and 7 A, the value is 3; and if the 
distance exceeds 7 A, the term is dropped from the sum. 
Other alternative weight assignments meeting the general 

15 criteria are clearly possible. 



energetically unfavorable moves which cause the candidate 
pharmacophore in each of the binders to depart from an 
average, shared configuration. In simulation equilibrium 

20 when the candidate is the actual pharmacophore, the binders 
share the pharmacophore structure and this term should be 
small. Since this is not a physical energy, in the case 
where the candidate pharmacophore is correct, this term 
should not be large compared to the total energy, in 

25 equilibrium no more than 10% of the total energy, and 

preferably approximately 5%. Further, the energetic disfavor 
should preferably be weighted by the affinity of each binder 
for the protein target, so that binders with greater affinity 
have a greater energetic effect, 

30 Many forms of this energy meet these criteria. The 

preferred form is: 



The consensus constraint term, as described, makes 





2cbijjd*r« 



N 



35 



l,consmnMus 




- 80 - 



WO 96/30*49 



PCIYUS96/04229 



R tc, is, the shared consensus structure for the candidate 
pharmacophore, is an average of the interatomic distances 
between corresponding atomic positions, i j , in the shared 
pharmacophore in all binders. This makes the constraints 
5 appear as a pseudo-bonds to a shared pharmacophore, which 
represents the binding to the protein target. The w* z rij are 
weights designed to meet the above size criteria. In the 
preferred embodiment, they are calculated with an overall 
multiplicative factor limiting the contribution of H 1<con , eMUS to 

10 no more than approximately 5% of the total equilibrated 
energy. Their relative value is selected to reflect that 
binders with lower affinity are less reliable indicators of 
actual pharmacophore structure. Thus the relative value of 
the weights is proportional to the logarithm of the affinity 

15 of the corresponding binder with an affinity of 1 jimolar 
having a relative weight of 1. Other weight assignments 
meeting the general criteria are clearly possible. The 
heuristic H congensus is the only Hamiltonian term linking 
together the various binders. 

20 All Hamiltonian components change only due to the 

dependence of the interatomic distances, Ri,ij» on the rigid 
unit's torsional rotation. The R A l5 are the well known 
Euclidean distances between the atomic coordinates stored in 
the rigid unit records. Calculation of coordinate changes 

25 due to rotation of angle <t> about a bond with unit direction j\ 
originating at atom A with position £ is well known, but will 
be detailed. (Throughout, symbols representing vector 
quantities are indicated by underlining.) First, translate 
from the current coordinate origin to an origin at position jc 

30 by adding £ to all relevant coordinate vectors. Second, 

apply a rotation matrix, T, to the atomic coordinate vectors. 
Third, translate back to the prior coordinate origin from £ 
by subtracting 2 from all relevant coordinate vectors. A 
rotation matrix is given by: 
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T=cos(<p) I+J2n r [l-cos(q>)] +Msin(<p) 



0 



~n z n y 



(12) 
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A reference for this computation is Goldstein, Classical 
mechanics . Massachusetts, Addison-Wesley (1981), especially 
chapter 4, which is herein incorporated by reference. 

Type I move generation 

Type I moves alter side chain structure of a randomly 
chosen amino acid in a randomly chosen binder. These random 
choices are conventionally made by a random number 
subroutine. The chosen side chain is "removed" from the 
binder peptide and "grown" back rigid unit by rigid unit. 
For the next, i'th, rigid unit to be added, K possible new 
torsional angles are generated according to p int . Preferably 
K is from 10 to 100. One of these torsional angles is 
selected according to p* x ~, and the rigid unit is added at 
this new angle. Determination of p m requires obtaining the 
normalization w/**. At each step the u Ant and u m used to 
calculate the respective probabilities include only 
interaction energies with rigid units present in other amino 
acids or already grown back. Rigid units not yet added are 
ignored. After all the side chain rigid units have been 
added back, W*** is computed as the product of the 
normalization factors. 

Fig. 12 illustrates a Type 1 move for glutamate. At 141 
the side chain has been removed. The first -CH 2 - unit is 
added back at 142 with new torsional angle 0 & . The generation 
according to p int and selection according to p*** of this angle 
ignores energy interactions with the other side chain rigid 
units not yet added. At 143, the next tCH 2 - rigid unit is 
added back at angle Finally at 144, the last -C0 2 rigid 



82 



WO 96/30849 



PCT/US96/04229 



unit is added at angle 0 2 . For this last step interaction 
energies with all the rigid units are considered in 
generating and selecting the new angle. 

W Dld is the weight for the reverse move, the move from 
5 the proposed new structure to the current configuration. For 
this, the proposed side chain is removed and regrown in its 
current structure unit by unit. For the next, i'th, unit 
generate K-l possible new torsional angles according to p int , 
again ignoring interactions with units yet to be added. The 

10 K'th new angle is the current angle for that unit. The 
current torsional angle is selected. Although p*" is not 
used, normalization w^** is determined. After all units have 
been regrown at the current angles, W° ld is computed as the 
product of the normalizations. 

15 The acceptance probability for the proposed side chain 

configuration is determined from equation 7 using W new and W old 

Type II move generation 

Type II moves alter a limited region of the amino acid 
20 backbone beginning at ?. randomly chosen backbone rigid unit 
of a randomly chosen binder peptide in a manner consistent 
with conformational constraints due to internal disulfide 
bonds. These random choices are made similarly to those for 
Type I moves. 

25 In Type II moves, side chains attached to the altered 

rigid units move rigidly with their backbone rigid units. 

For this move, important geometric constraints must be 
met. In a randomly chosen binder and at a randomly chosen 
backbone bond between adjacent rigid units, a torsional angle 

30 rotation by <f> 0 is made. Subsequent backbone torsional 

rotations are chosen so that a minimum number of rigid units 
undergo a spatial displacement . -This constraint fixes a 
limited number (if any) of possible subsequent torsional 
angles as a function of <P 0 so that at most 4 rigid units are 

35 spatially displaced and rotated with at most 3 additional 

rigid units undergoing a rotation. This move is an important 
aspect of this invention and is required to maintain the 
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conformational constraint due to the disulfide bridge. Since 
only 7 rigid units are spatially modified, the Type II move 
preserves the 8 amino acid cycle (20 rigid units) , including 
the cystine side chain. 
5 Fig. 13 illustrates a Type II move of a poly-glycine 7- 

mer. Rigid unit positions are indicated generally by black 
circles as at 1509 with incoming bonds generally as at 1502. 
A C a rigid unit (B unit) is illustrated in box 1515, and an 
amide bond (C unit) in box 1516. Backbone structure 1500 in 

10 transformed into structure 1501 by the Type II move generated 
by an initial rotation about bond 1502. Subsequent rotations 
about bonds 1503, 1504, 1505, 1506, 1507, and 1508 are 
thereby determined so that the rigid unit 1510 and at most 
three subsequent units undergo only a rotation without any 

15 spatial displacement. The four rigid units between units 
1509 and 1510 undergo both a spatial displacement and a 
rotation as structure 1500 is transformed to structure 1501. 
No other backbone rigid units are altered. 

The derivation of these assertions, including 

20 expressions for the- allowed angles, is in Section 8. 

Appendix: Concerted Rotation. Fig. 14 defines notation used 
in this Appendix: Concerted Rotation. Poly-glycine 7-mer 
backbone 1600 is the same as in Fig. 13. Rigid unit 
positions are indicated generally by black circles as at 1601 

25 with incoming bonds generally as at 1602. The torsional 

rotations 4> 0 to <f> 6 are about bonds 1602 to 1608, respectively, 
between sequential, adjacent rigid units. The rigid unit 
position vectors Xo to I** illustrated as vectors 1610 to 
1616 , respectively, define the position of these sequential 

30 rigid units with respect to a laboratory coordinate system 
with origin 1609. Summarizing this Appendix, the 
determination of the fixed torsional angles proceeds as 
follows. The allowed values for # a are the roots of equation 
34, which depends on the $ 0 driver angle and tf> 2 through 0«. 

35 But 4> 2 through <f> A can be determined in terms of Two 

solutions for <t> 2 are determined by equation 25 in terms of 0 ia 
Two solutions for # 3 are determined by equation 29 in terms of 
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the preceding <f>'s. Finally, a simple inversion of equation 
32 determines one solution for 4> A in terms of the preceding 
#'s. Having found the allowed values of 4> lt then equations 
25, 29, and 32 determine corresponding allowed values for the 
5 other 0's, which in turn determine the alteration of the 
first four rigid units caused by the <f> 0 initial rotation. 

More precisely, final torsional angles <f> 0 to <f> 6 determine 
position vectors ^ to ^ by applying rotation matrix 18 to 
equations 17 to obtain new position vectors in the laboratory 

10 coordinate system, the rotation matrices of equations 16 and 
18 being determined by these final torsional angles. 
Position vectors r e and £ 5 to do not change. Then rigid 
unit 0 is translated to position £ 0 ; aligned so that its 
incoming bond axis is along the direction of the outgoing 

15 bond of unit -1; and finally rigidly rotated so that the end 
of its outgoing bond is at position Rigid unit 1 is then 

translated to position £ 1# - aligned so that its incoming bond 
axis is along the outgoing bond of unit 0; and rigidly 
rotated so that the end of its outgoing bond is at position 

20 Rigid units 2 to 6 are then added to the backbone in a 

similar fashion. In this fashion the Type II move geometry 
is determined. Any side chains attached to these rigid units 
are rigidly rotated when their parent unit is rotated. 

The Type II rotation is chosen in the following manner. 

25 Using the conf igurational bias prescription, the Hamiltonian 
is divided into u int and u txt . u iftt is preferably 0, or 
alternatively is the torsional energy associated with the 
rigid unit of interest, while u** includes all remaining 
interaction energies. In the previous manner, u int determines 

30 p 1 ™ according to which are generated K' candidate 4> Q rotation 
angles. Preferably K' is 1. Then the geometric constraints 
are solved for each candidate <t> 0 . Typically, but not always, 
6K' , denoted K, possible backbone alterations are obtained. 
One of these is selected by p - **, determined by: 
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*'* (13) 



u ext includes all interactions not in u int , that is all other 
backbone and side chain interactions- Because these 
10 determinations occur in torsional angle space and change the 
volume element in that space, the Jacobian, determined by 
equation 35, of the selected Type II move is also needed as a 
weight in the acceptance probability for detailed balance . 
This acceptance probability for Type II moves is; 



15 



accept (curr-prop) * min[l, — oJd oJd l ^ 14 ' 

W u 



The weight and Jacobian of the reverse transformation 

2 0 from the proposed to the current structure are also needed in 
the acceptance probability for Monte Carlo detailed balance. 
These quantities are determined as follows. Using the 
proposed backbone structure just selected as the basis, 
generate a set of K'-l new # 0 torsional angles according to 

j 5 p int and also include the current # 0 in the set. Then solve 
the geometric constraint to determine the permitted 
alterations. The current configuration , since it exists, 
must be among the permitted structures. From this set of 
permitted structures determine W° w per equation 13 . Then 

2 0 select the current configuration and compute the Jacobian J° ld 
per equation 35, This completes the determination or the 
acceptance probability. 

Proline is approximated. Proline is not subject to Type 
I moves. However, proline is subject to normal Type II 

35 moves, with its side chain bond to the amino nitrogen broken. 
The side chain thus moves rigidly with its backbone rigid 
unit as in normal Type II move. To compensate for the broken 
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bond approximation, the C„-N torsional energy amplitude in the 
proline backbone is set at approximately 5 kcal/mole. {By 
contrast the torsional energy in a typical amino acid of the 
C e -N bond is approximately 0.3 kcal/mole.) This invention is 
5 adaptable to other suitable approximations for proline. 
Alternatively, the proline side chain may be subject to 
alterations which preserve its cyclicity, such as for 
example, by an extension of the constraint scheme just 
described . 

Program detailed description 

The following describes the construction and use of a 
computer method and apparatus to perform the method of step 
5. The listing of this code is included in a microfiche 

15 appendix to this specification. Fig. 15 is a general view of 
the computer system and its internal data and program 
structures- To the left in Fig. 15 are the principal data 
structures of this method. Current structures 1701 contains 
the current structures of the N binders represented in memory 

20 as described. Proposed structure 1702 contains working 
memory areas used to generate a proposed new structure for 
one binder peptide. Structures 1701 and 1702 would typically 
be stored in RAM memory of the computer system, RAM memory 
being memory directly accessible to processor fetches. 

25 Stored structures 1703 contain similar memory representations 
of all the peptide structures generated, accepted, and 
selected for storage. This is typically stored on permanent 
disk file (s) . 

Candidate pharmacophore structures 1704 are input to the 
30 programs from either a disk file of the display and input 
unit 1712. The identified candidate structures are used to 
determine the w' 1#ij in Eqn 11. 

Parameters 1705 comprises several parts. First, are all 
the AMBER atomic interaction definitions and parameters. 
35 Second, are standard representations of the amino acids 
including component rigid units and atomic charge 
assignments- Third, are parameters controlling the run. 
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These further comprise, by example, values for K and K' , the 
Type I/I I move branching ratio, the number of moves made in 
the simulations run, the simulation total energy record, etc. 
The parameters would typically be loaded from disk file(s) 
5 into RAM memory for manipulation during a simulation run. 
Unit 1712 includes display and input devices'* for 
monitoring and control. Depicted on the display are the 
total number of moves made in the current run and the course 
of the total energy, which is similar to that illustrated in 
10 Fig. 9- 

Processor 1711 is loaded with necessary programs prior 
to a simulation run and executes the programs to perform the 
simulation method. The general structure consists of main 
program 1706, structure modification program 1707, Type I and 

15 II move generators 1708 and 1709, and subroutines 1710. The 
subroutines consist of common utility subprograms, such as 
for performing torsional rotations about bonds and computing 
interaction energies by the previous methods, and 
conventional library subprograms, such as for performing 

20 input and output and finding random numbers. Any 

scientifically adequate random number generator can be used. 
A reference for random number generators is Press et al . , 
Numerical recipes: the art of scientific computing. 
Cambridge, U.K., Cambridge University Press, (1986), chapter 

25 7. The invention is equally adaptable to other program 
structures that will occur to those skilled in computer 
simulation arts. 

The preferred embodiment of these structure is an Indigo 
2 workstation from Silicon Graphics (Mountain View, CA) . 

30 Alternatively, any high performance workstation, such as 
products of Hewlett-Packard, IBM or Sun Microsystems, could 
be used. Preferably the data and program structures are 
coded in the C computer language. Alternatively any 
scientifically oriented language, such as Fortran, could be 
35 used. Conventional subroutine and scientific subroutine 
libraries are used where appropriate. 
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depending on display of run progress results. Alternatively, 
termination can be mechanically controlled. After completing 
a certain number of total moves after run energy 
equilibration, the moves being split between Types I and II 
5 according to the specified branching ratio, the run is 

terminated. The preferred number of total moves is 25,000, 
and the preferred Type I/II branching ratio is 4 . Thus it is 
preferred to have 20,000 Type I and 5,000 Type II moves after 
equilibration per simulation run. 

10 At step 1810, the stored structures are analyzed to 

determine both the consensus pharmacophore structure and the 
structures of the remainder of the binders. In the preferred 
embodiment, atomic positions in the equilibrated stored 
structures for each peptide are averaged to obtain the 

15 predicted geometric structure. The shared pharmacophore 
structure is obtained from the predicted structure of each 
peptide, again by averaging the shared position information 
for all peptides. Alternatively, before structure averaging, 
the structures generated for each binder can be clustered 

20 into similar groups -and the clusters for each peptide 
separately averaged. The clusters would represent 
alternative peptide folding patterns. It is anticipated that 
because preferred binders are short peptides constrained by 
disulfide bridges, any alternative foldings identified will 

25 be structurally similar. The clustering can be done by the 
exemplary methods found in the previously referenced article 
Gordon et al. Fuzzv cluster analysis of mo lecular dynamic 
trajectories. Proteins: Structure, Function, and Genetics 
14:249-264 (1992). For all analysis methods, the choice of 

30 the preferred number of stored moves is adjusted to achieve 
adequate estimated statistical position errors • Further, 
preferably, the results of three runs are combined to achieve 
increased statistical confidence. 

Other information is also output. Particularly 

35 important is the course of the total energy for each peptide 
and for all the peptides, and the intra -molecular, consensus, 
and constraint components of the energies. These energy 
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components are used in determining whether a consensus 
pharmacophore has been found. As previously described, this 
is preferably done by insuring that Hc^^^ is small compared 
to the total energy and is minimized by a particular 
5 candidate pharmacophore. Also must be relatively small. 

Finally at 1811 , all results are output in a form usable 
for the subsequent steps 6 and 7 of Fig. 1. For example, 
this may be a particular file format suitable for subsequent 
lead compound search by a database query. 

10 Turning now to Fig. 17, structure modification program 

1707 will be described. This is invoked from the main 
program at 1804. Upon entry, this program randomly picks one 
of the binder peptides at 1901 for which to generate a 
proposed structure and also picks which type of move to use 

15 at 1902. This latter random choice is made according to an 
adjustable Type I/II branching ratio (preferably 4) . For a 
Type I move, step 1903 picks a random amino acid side chain 
of the selected peptide, and step 1904 invokes the Type I 
move program. (Proline has no Type I moves.) For a Type II 

20 move, step 1905 picks a random backbone bond between rigid 
units to rotate and also a random direction from the picked 
bond along which backbone rigid unit structure will be 
altered. Step 1906 invokes the Type II move program* 

Figs. 18A and 18B illustrate the Type I move generator 

25 1708, which is defined by equations 6 and 7. With reference 
first to Fig. 18A, the proposed structure of the selected 
peptide is created from its current structure by removing the 
selected side chain. All intra -molecular interactions are 
subsequently determined with respect to the proposed 

30 structure absent side chain rigid units not yet regrown. K 
candidate new torsional angles for the next, i'th, rigid unit 
to add are generated by at 2002. Preferably K is between 

10 and 100. Generation of these angles uses the conventional 
rejection method referenced in Press et al. at § 7.3. The 

35 weight w 4 ,Kt and p/** are determined for each of these 

candidate angles. This requires the rigid unit to be added 
to be rotated to the candidate angle using the previous 
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rotation method. Candidate interaction energy is determined 
from candidate interatomic distances resulting from the 
candidate rotation. One of the candidate angles is 
probabilisticly selected at 2003 and the rigid unit added 
5 back at this torsional angle at 2004. If there are more 
units to add, which is tested at 2005, these steps are 
repeated. If not, the acceptance weight W" ew is determined as 
the product of the w,** 1 at 2006. Lastly the old weight is 
determined at 2007. From the weights the move acceptance 

10 probability is found for use at 1805. 

Fig. 18B details the determination 2007 of W° id , the 
weight for the reverse move from the proposed to the current 
side chain structure. Temporarily the proposed structure is 
used as a basis for energy determination at 2008, and then 

15 the current structure is restored at 2016, when this process 
is finished. The proposed side chain is removed at 2009 for 
regrowth rigid unit by rigid unit as in Fig. 18A. For the 
next, i'th, rigid unit to be added back, K-l candidate angles 
are generated according to p> nc at 2010 with the current value 

20 of that angle for the K-th candidate at 2011. As previously, 
the weight w a * xt is determined for these candidate angles at 
2012. The rigid unit is added back at the current, K-th, 
angle at 2013. If there are more units to add, tested at 
2014, these steps are repeated. If not, the acceptance 

25 weight W° ld is determined as the product of the w x m at 2006. 
Figs. 19A and 19B illustrate Type II move generator 
1709, which is defined by equation 13 and 14 and the 
concerted rotation geometric constraints. With reference to 
Fig. 19A, K' candidate new torsional angles for the selected 

30 backbone bond are generated by p tot using the rejection 
method. Preferably K' is 1. Torsional rotations about 
adjacent backbone bonds, in the selected direction along the 
backbone, permitted by the concerted rotation constraints are 
determined from the roots of equation 34 at 2102. Equation 
35 34 depends on intermediate variables obtained from equations 
25, 29, and 32 and determined in that order. The roots are 
simply found by searching the interval t-ir,ir] in 0.04° 
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increments . When a root is located in a 0 . 04 ° segment , it is 
refined with the bisection method referenced in Press et al. 
at § 9.1. It is expected on the average that six K' 
solutions will be found. If no roots are found at 2103 , the 
5 candidate rotation is impossible and this move is skipped. 
If solutions exist, next, at 2104, p* xt and W" ev are' determined. 
Using the described rotation method, the backbone rigid units 
are rotated (with consequent spatial displacement of 4 units) 
to a candidate torsional angle solution about their mutual 

10 bonds. Additionally, any side chains attached to backbone 
rigid units are rigidly rotated using the same method. 
Having made these rotations, candidate interatomic distances 
and candidate interaction energies can be determined and used 
to obtain p ext for this candidate solution. One of the 

15 candidates is probabilisticly selected at 2104, and the 
backbone and any side chains are rotated according to this 
candidate into the proposed structure. The Jacobian of this 
transformation is determined at 2106 by equation 35. Lastly 
the old acceptance weight and Jacobian are determined at 

20 2107. From the weights and Jacobians the move acceptance 
probability is found for use at 1805. 

Fig. 19B details the determination 2107 of W° ld and J old 
for the reverse move from the proposed to the current side 
chain structure. Temporarily the proposed structure is used 

25 as the basis for energy determination at 2008, and the 

current structure is restored at 2016, when this process is 
finished. At 2109, a set of K'-l candidate torsional angles 
is generated for the selected backbone bond according to p tot 
using the rejection method and the current torsional angle is 

30 added to this set. If as preferred, K' is 1, this step 
results in a set with only the current angle. At 2111, 
similarly to 2102, the permitted torsional rotations about 
adjacent backbone bonds are determined from the equations 
expressing the concerted rotation constraints. Special care 

35 is taken to ensure that the original conformation is found by 
the root finding procedure. In particular, the search 
interval is centered on. the known original <t> x and is made as 
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small as necessary to isolate the root, which may be as small 
as 0.004° or smaller. The current structure must be among 
these solutions, since it exists. Select it at 2112. W° ld is 
computed from the candidate angle solution, making the 
5 candidate rotations and determining candidate interactions. 
Also the Jacobian, J° ld , of the transformation is computed 
from the proposed to the current structure. 

5.B. CONSENSUS STRUCTURE TEST 

10 Having selected a candidate pharmacophore and determined 

a best possible consensus structure and best possible 
structures for the remainder of the binder molecules, the 
consensus test, step 6, tests whether a consensus structure 
has actually been found. A consensus pharmacophore structure 

15 consists of a spatial arrangement of chemically similar 

groups shared by all the N binders to high accuracy. Since 
an actual pharmacophore exists, the N specifically binding 
members of the screened libraries will share the actual 
structure. However, the remainder of binder molecules will 

20 share no other similar structures to such a high accuracy. 
Therefore, a structure consensus of the N binders is possible 
only if the candidate pharmacophore is the actual physical 
pharmacophore responsible for the actual binding. If the 
candidate selected relates to other parts of the binder 

25 molecules, no structure consensus will be found. Further, if 
the Monte Carlo determination attempts to impose a consensus 
on parts of the binder molecules that do not share structure, 
an inconsistent overall structure will be obtained for the 
remainder of the binder molecules. 

30 Therefore, two preferred consensus tests are applied: 

one test asks whether a consistent candidate pharmacophore 
has been obtained, and a second test asks whether consistent 
structures have been obtained for the remainder of the binder 
molecules. Both tests have a preferred absolute and a less 

35 preferred relative version. 

There are two portions for the first test. First, are 
all the consensus pharmacophore distances obtained in the N 
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binders within at least a specified distance, preferably 
approximately 0.25 A, of each other? Second, is the 
consensus energy, H^,^,^, relatively small compared to the 
total molecular energy (e.g., less than at most approximately 
5 5-10% of the total molecular energy) as determined by the 
Monte Carlo method? 

There are also two portions of the second test. First, 
can the intramolecular distances predicted by the Monte Carlo 
method be confirmed by additional distance measurements? 

10 Second, since the Monte Carlo method utilizes distance 
constraints previously measured, one or more of these 
measurement constraints can be ignored and the predicted 
distance checked against that measured distance. Tolerances 
for these tests are distance agreements of at least specified 

15 distances, e.g., approximately 0.5 A, in each binder. 

The two preferred tests have been described in the 
absolute version as requiring checks against absolute 
tolerances. Alternatively, the values of the pharmacophore 
distance differences among the binders, H^.^,, and the 

20 differences of the predicted and measured distances can be 
accumulated for all the possible candidate pharmacophores, 
the candidate selected being that one minimizing these 
departures. Therefore, the selected candidate will have the 
minimum values for the differences of the pharmacophore 

25 distances in the binders, the minimum value for H cons#nsua , and 
the minimum values of the differences of predicated from 
measured distances. 

This invention is adaptable to other tests that evaluate 
the consistency of the consensus structure obtained for the 

30 candidate pharmacophore and the -accuracy of the structure 
obtained for the remainder of the binder molecules. 

5.9. LEAP COMPOUND DETERMTNAfTOU 

Having started at step 1 with a target of interest, upon 
35 completion of step 6 of Fig. 1 a high resolution 

pharmacophore structure has been determined as well as 
supporting structures of the N binder peptides. This high 
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resolution structure is used in step 7 to determine lead 
compounds for use as a drug that will bind to the original 
target of interest. 

Thus, one or more lead compounds are determined, that 
5 share a pharmacophore specification with the determined 

consensus pharmacophore structure. This determination can be 
preferably done by one of several methods: by a search of a 
database of potential drug compounds or of chemical 
structures (e.g., the Standard Drugs File (Derwent 
10 Publications Ltd., London, England), the Bielstein database 
(Bielstein Information, Frankfurt, Germany or Chicago), and 
the Chemical Registry (CAS, Columbus, OH)) to identify 
compounds that contain the pharmacophore specification; by 
modification of a known lead compound to include the 
15 pharmacophore specification; by synthesizing a de novo 

structure containing the pharmacophore specification; or by 
modification of binders to the target molecule (e.g., 
isolated in step 2) outside of the pharmacophore structure to 
render the binder more attractive for use as a drug (e.g., to 
20 increase half-life, .solubility, ability to achieve desired in 
vivo localization) . 

Database search queries are based not only on chemical 
property information but also on precise geometric 
information. Computer-based approaches rely on database 
25 searching to find matching templates; Y.C. Martin, pat; abase 
searching jn drug design . J. Medicinal Chemistry, vol. 35, pp 
2145-54 (1992), which is herein incorporated by reference. 
Existing methods for searching 2-D and 3-D databases of 
compounds are applicable to this step. Lederle of American 
30 Cyanamid (Pearl River, New York) has pioneered molecular 

shape -searching, 3D searching and trend- vectors of databases. 
Commercial vendors and other research groups have enhanced 
searching capabilities [MACSS-3D, Molecular Design Ltd. (San 
Leandro, CA) ; CAVEAT, Lauri, 6. et al., University of 
35 California (Berkeley, CA) ; CHEM-X, Chemical Design, Inc. 
(Mahwah, N.J.)] . 
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The pharmacophore structure determined in this invention 
is adaptable to any of these methods and sources of chemical 
database searching and to the enumerated non-database 
methods. Output will be lead compounds suitable for drug 
5 design. An important aspect of this invention is that the 
high resolution pharmacophore structure will lead~to highly 
targeted leads. Lower resolution structures result in a 
geometric increase in the number of lead compound query 
matches. Example 1 illustrates this effect. 

10 

5.10. APPENDIX: CONCERTED ROTATION 
Since the preferred molecules under consideration are 
conf ormationally constrained by disulfide bridge (s) , a Monte 
Carlo move that preserves this constraint is required. The 

15 "concerted rotation" scheme used for alkanes can be extended 
to allow rotation of the torsional angles in conf ormationally 
constrained peptides. This appendix describes this 
extension. Dodd et al . (1993) discusses the original, 
restricted method. (The essential extensions are expressed 

20 in equations 27, 28, and 34.) This method is directly 
applicable to the cyclic residue of proline, and an 
alternative embodiment of this invention would thermally 
perturb proline with a move of similar geometric constraints. 
Fig. 14 illustrates the geometry under consideration. 

25 Illustrated backbone 1600 is a poly-glycine 7-mer. Rigid 

unit positions are indicated generally by black circles as at 
1601 with incoming bonds generally as at 1602. The torsional 
rotations <f> 0 to $ 6 are about bonds 1602 to 1608, respectively, 
between sequential, adjacent rigid units. The rigid unit 

30 position vectors x 0 to illustrated as vectors 1610 to 
1616, respectively, define the position of these sequential 
rigid units with respect to a laboratory coordinate system 
with origin 1609. A C B rigid unit (B unit) is illustrated in 
box 1630, and an amide bond (C unit) in box 1631. 

35 To formulate this method, let us consider rotating about 

seven torsional angles, which will displace the root 
positions and rotate four rigid units, rotate up to three 
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additional ones, and leave the rest of the peptide fixed. 
The root position of a rigid unit is the C a position for a B 
unit, the C position for a C unit, the C position for a CH 2 
unit, and the S position for the S unit in cystine. If unit 
5 5 is a C unit, however, £t is defined to be the backbone amino 
nitrogen position of that unit. For each unit, lfet us define 
6j to be the fixed angle between the incoming and outgoing 
bonds. Thus, Q x ■ 0 for a C unit, and e i - 70.5° for all 
others . 

10 The method leaves the positions of units i < 0 or i > 

5 fixed. The torsion <f> 0 is changed by an amount 6<f> 0 . The 
values of ^, 1 < i < 6 are then determined so that only the 
positions z i of units 1 < i < 4 are changed. 

The method requires several definitions to present the 

15 solution for the new torsional angles. The bond vectors are 
defined to be the difference in position between unit i and 
unit i - 1, as seen in the coordinate system of unit i: 

1* "-i-r {15) 

20 

Bond vectors 1 1 to l s are illustrated in Fig. 14 at 1620 to 
1624, respectively. The length and orientations of the Xi are 
determined by rigid unit structure and the length and angle 
AMBER parameters for bonds between atom types. The 
coordinate system of i is such that the incoming bond is 

along the £ direction. Thus JU « 1 4 i if atoms r 4 and r 4 . a 

are directly bonded to each other and has x- and y- components 
30 otherwise. Here x is a fixed unit vector along the x 

direction. Now define a rotation matrix that transforms from 
the coordinate system of unit i+1 to unit i 

35 
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cos8_, sinBj 0 

sinB.sin^j -cosB.sin^ -cos4>,j 



(16) 



The positions of the units in the frame of unit 1 are, thus, 
given by: 



10 



11) 



V T i (1 2 *T 2 1 3 ) 



(17) 



X«" -I 1 *T 1 (2 a *T a a 3 *T J l 4 )) 



15 



Further define the matrix that converts from the frame 
of reference of unit 1 to the laboratory reference frame 



Tj 1 "" = [costH + nn^d-costy) + MsiniJ>] A. 



20 



(18) 



where 



25 



M - 



0 -n t n y 



n. 



o -n, 



-n y n x o 



(19) 



and 



30 



n - 
cost); = 
sini|i = 



x x x 
|i x x| 
x • i 

Ix||i| 
I (x x j) | 

kllil 



35 
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where x is the axis of the bond coming into unit I. The 
matrix A is a rotation about £ and is defined so thaz 



10 



where 



15 



A = 



'10 0* 
0 c -s 
<0 s c, 



c 
s 



(l lv Ar y + l 12 Ar 2 )/(Ar y 2 +Ar* z ) 
( -l 12 Ar y + l ay Ar 2 ) / (Ar 2 + Ar z 2 ) 



(20) 



(21) 



Here AE * A [r/**] -I <x -X .) if unit 0 is a C unit. Otherwise, 



AX - 1,. 

20 The method proceeds by solving for 4>n 2 <i < 6, 

analytically in terms of $ la Then a nonlinear equation is 
solved numerically to determine which values of 4>i> if any, 
are possible for the chosen value of 0 O . 

The derivation proceeds in the coordinate system of unit 

25 l, after it has been rotated by the chosen <f> D . Define 



(22) 



30 



If 6, x 0 and 6 5 * 0, one can see from Fig. 14 that the 
distance between unit 3 and unit 5 is known and equal to 



2 a (l4x COs6 4-l4y sine 4*l3x) 2 + 

U 4x sin6 4 +l 4y cose 4 +l 5y ) 2 



(23) 



But this distance can also be written as 



35 
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2 = IS-T^IJ 2 



(24) 



Equating these two results, two values of 4> 3 are possible 



* 2 T 



arcsinf^) - arctan (x y /x z ) - H (x z ) 



<J>2 = 7i -arcsinf^) - arctan lx y /x z ) - H (x z ) , 



10 



with 



15 The constant c a is given by 

g/ -x 2 - J 3 * 2x x (cos6 2 J 3x * sin8 2 2 3y ) 



20 



25 



-2 (sin8 2 J 3x -cos8 2 2 3y ) (x;+x x ) 



2\ 1/2 



, e J "0 J e 5 «'0 



i 3 ^J«x^s,cose 4 -x,cose 2> 0 

sin9 2 (x^x|) 1/2 
lX 5 -i 2 ) -(x £ -i s ) / J 6 -J 5 -2 4x cos8 < -x x (cose 2 J JX +sin6 2 J, v ) 



(25) 



(26) 



(sin8 2 i }x -cos9 2 2 3y ) (x 2 *x 2 ) 1/2 



. 8,*0,6, 



i 3x cosVx x (co S 8 2 i 3x + S in8 2 l 3y ) e o 
(sin8 2 J JX -cos8 2 J 3y ) (x 2 -rx 2 ) 1 ' 2 



(27) 



30 



where £ is given by Egn. 24 if 8 S * 0, and 2£ ■ Ii* 1 [Ii 1 *"] _1 (x« - 
Is) /l t if 8$ « 0 . Clearly for there to be a solution | c 3 1 < 1 . 
The last three equations for c 3 were determined by conditions 
similar to equating Eqns. 23 and 24. For e 3 * 0, 8 S * 0, the 
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x component of r 5 t3) - £ 3 m is known to be equal to (l 4x + 
l 5 cos6 4 ) . For 9 3 * 0, 6 5 = 0, the x component of x 5 <5) - £ 3 tS) is 
known to be equal to l Sx + l 4x cose«. For e 3 = 0, G £ - 0, the 
angle between £ 3 - £2 and £ £ - r 5 is known to be equal to 8«. 
5 To determine tf> 3 two expressions for |r 5 - r 4 | 2 are again 

equated to determine that: 

c 2 = J s'y 2 'J'^yx(cos8 3 i 4 ^sine 3 j 4y ) {28) 



2(sin6 3 2 4x -cos6 3 i ) [yl+yt) U2 



10 



4> 3 = arcsin(c 2 ) - arctan (y y /y T ) - H (y z ) 



^J 1 = n -arcsin(c 2 ) - arctan (y y /y z ) - # (y z ) , 



(29) 



15 



where y = ( 7\ _C-I 2 ) -2 3 . . Again, |c 2 | < 1 for there to be 



a solution. 

If 8 5 * 0 f the value of can now be determined from: 

20 

* x u> = x m ♦T 1 T 2 T 3 T 4 1 S . (30) 

Defining 

U 3 - T3 1 T" 2 1 T- 1 1 lT\ ab ]" 1 (x 5 - X«) - (31) 

25 

the equations that define tf> 4 are given by 



g 3y = cos<J> 4 (sin6 4 2 5Jf - cos6 4 I 5y ) 
g yx = sin4> 4 (sin8 4 J 5jf - cos8 4 I 5y ) 



(32) 



30 

This is a successful rotation if the position of ^ is 
successfully predicted. That is, the equation 

x <i>- x »> * 1^1,1^. = [T 1 *] -* (X^-Xj) . (33) 

35 

must be satisfied. Consider the x-component , which implies 
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F 5 (4> a ) 



(z ( 6 1) -x ( 5 1, ) T r 1 r 2 r3r 4 i-(2 6Jc cose 5 ^j 6y sine 5 ) =o, e 5 #o 
(j: 4 -x 3 ) -(x fc -x 5 ) -J«i 6 cos6 4 =o, 83*0,65=0 

|x r xJ-[{i 6x -i 5x ) 2 -J s 2 y ] 1/2 -o, e 3 =o,e 5 =o 



(34) 



15 



20 



must be satisfied if the rotation is successful. The 
10 equations for the case e 5 = 0 clearly express the geometric 
conditions required for a successful rotation. 

Eqn. 34 is the nonlinear equation for <t> 1 because $ 2 , <f> 2 , 
and are determined by Eqns. (25), (29), and (32) in terms 
of 4>i- This equation has between zero and four values for 
each value of 0 1# however, due to the multiple root character 
of Eqns. (25) and (29) . The equation is solved by searching 
the region -it < <f> < tt for zero crossings. The search is in 
increments of -* 0.04°. These roots are then refined by a 
bisection method. 

The transformation from tf> x§ 0 < i < 6 to the new solution 
which is constrained to change only r i# 1 < i < 4 actually 
implies a change in volume element in torsional angle space. 
This change in volume element is the reason for the 
appearance of the Jacobian in the acceptance probability. 
The Jacobian of this transformation is calculated in Dodd et 
al. (1993)at pp. 991-93. It is slightly different here since 
root position £5 is not necessarily the head position. The 
Jacobian is given by. 

where the 5x5 matrix B is given by B ld = x (x s - for 
i 

< 3 and B i5 - = x (£ $ - £ s )/\z< - XsNio for i « 4,5. Here & 
■ Xi# except that fc 5 is the head position even if e s « 0, and 
Mi is the incoming bond vector for unit i. 



25 
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Repeated application of the concerted rotation may lead 
to a slightly imperfect structure, due to numerical precision 
errors. In an alternative embodiment, peptide geometry would 
be restored to an ideal state by application of the Random 
5 Tweek algorithm after several thousand moves (Shenkin et al . , 
1987, Biopolymers 26:2053-85). 

The invention is further described in the following 

examples which are in no way intended to limit the scope of 

the invention. 

10 6. EXAMPLES 

6.1. RELATION BETWEEN EFFECTIVENESS OF 
POTENTIAL DRUG IDENTIFICATIONS AND 
PHARMACOPHORE GEOMETRIC TOLERANCE 

Searches of a drug library well known to medicinal 

chemists, the Standard Drugs File (Derwent Publications Ltd., 

15 London, England) , illustrate the geometric increase in the 

number of compounds found (and thus decrease in expected 

effectiveness of identification of potential drugs) as 

pharmacophore geometric tolerance is increased. Table 4 

tabulates the results. 

20 

Table 4 



25 



I 5HT3 (5 Hydroxytryptophan) 


[j Tolerance (A) 


Number of drug compounds 


I 2.0 


64 


I 1-0 


35 


0.5 


27 


0.25 


12 


0.10 


1 



35 
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Dopamine 


Tolerance (A) 


Number of drug compounds 


2.0 


188 


1.0 


185 


0.5 


60 " 


0.25 


48 


0.10 


5 



10 The pharmacophores are two well known neurotransmitters, 
5 -hydroxy tryptophan and dopamine. As the tolerance of one 
distance in the pharmacophore structure is decreased from 2 . 0 
to 0.1 A, the number of compounds retrieved from the database 
is listed. The advantage of achieving pharmacophore 

15 resolution better than approximately 0.25 A is clear. 

If the tolerance of three distances were involved, the 
expected number of compound retrieved would be the cube of 
these numbers. For the dopaminergic pharmacophore, the 
number of lead compounds would decrease from over 6.5xl0 6 to 

20 about 125 as three tolerances were decreased from 2.0 A to 
0.1 A. 

This example illustrates the geometric increase in the 
number of leads identified as pharmacophore geometry is less 
well defined. It thus a very preferred aspect of this 
25 invention that the computational method results in 

determining pharmacophore structure accurate to at least 
approximately 0.25 to 0.30 A. Thus an exponentially large 
improvement in lead compound selection for drug design can be 
expected to result from this invention. 

30 

6.2. EXPRESSION AMD PURIFICATION 
OF TARgFT PROTEAN? 

Target molecules that are proteins, for example ras, 

raf , vEGF and KDR, are expressed in the Pichia pastoris 

35 expression system (Invitrogen, San Diego, CA) and as 

glutathione -S- transferase (GST) -fusion proteins in E. coli 

tGuan and Dixon, 1991, Anal. Biochem. 192:262-267). 
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The cDNAs of these target proteins are cloned in the 
Pichia expression vectors pHIL-Sl and pPIC9 (Invitrogen) . 
Polymerase chain reaction (PCR) is used to introduce six 
Histidines at the carboxy- terminus of these proteins, so that 
5 this His-tag can be used to affinity-purify these proteins. 
The recombinant plasmids are used to transform Pichia cells 
by the spheroplasting method or by electroporation. 
Expression of these proteins is inducible in Pichia in the 
presence of methanol. The cDNAs cloned in the pHIL-Sl 

10 plasmid are expressed as a fusion with the PH01 signal 

peptide and hence are secreted extracellularly . Similarly 
cDNAs cloned in the pPIC9 plasmid are expressed as a fusion 
with the a-factor signal peptide and hence are secreted 
extracellularly. Thus, the purification of these proteins is 

15 simpler as it merely involves affinity purification from the 
growth media. Purification is further facilitated by the 
fact that Pichia secretes very low levels of homologous 
proteins and hence the heterologous protein comprises the 
vast majority of the protein in the medium. The expressed 

20 proteins are affinity purified onto an affinity matrix 

containing nickel. The bound proteins are then eluted with 
either EDTA or imidazole and are further concentrated by the 
use of centrifugal concentrators. 

As an alternative to the Pichia expression system, the 

25 target proteins are expressed as glutathione -S- transferase 
(GST) fusion proteins in r. coli. The target protein cDNAs 
are cloned into the pGEX-KG vector (Guan and Dixon, 1991, 
Anal. Biochem. 192:262-267) in which the protein of interest 
is expressed as a C- terminus fusion with the GST protein. 

30 The pGEX-KG plasmid has an engineered thrombin cleavage site 
at the fusion junction that is used to cleave the target 
protein from the GST tag. Expression is inducible in the 
presence of IPTG, since the GST gene is under the influence 
of the tac promoter. Induced cells are broken up by 

33 sonication and the GST- fusion protein is affinity purified 
onto a glutathione -linked affinity matrix. The bound 
protein is then cleaved by the addition of thrombin to the 
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affinity matrix and recovered by washing, while the GST tag 
remains bound to the matrix. Milligram quantities of 
recombinant protein per liter of E. coli culture are expected 
to be obtainable in this manner. 

5 

6.3* SYNTHESIS AND SCREENING OF POLYSOME -BASED 
LIBRARIES ENCODING RANDOM CONSTRAINED 
PEPTIDES OF VARIOUS LENGTHS 

6.3.1. PREPARATION OF DNA TEMPLATES 
DNA libraries with a high degree of complexity are made 
as two components: an expression unit, and a semi-random (or 
degenerate) unit. The expression unit has been synthesized 
chemically as an oligonucleotide (termed T7RBSATG) , and 
contains the promoter region for bacteriophage T7 RNA 

15 polymerase, a ribosome binding site, and the initiating ATG 
codon. The random region, also synthesized as an 
oligonucleotide (termed MMN6) contains a region complementary 
to the expression unit, the antisense version of the codons 
specifying Cys-X 6 -Cys, and a restriction site (BstXI) . The 

2q library is constructed by annealing 100 pmol of 
oligonucleotide T7RBSATG [having the sequence 
5 ' ACTTCGAAATTAATACGACTCACTATAGGGAGACCACA 
AATITrGTTTAACTITAACTTTAAGAAGGAGATATACATATGCAT3 ' 
(SEQ ID NO: 2)]; and oligonucleotide MNN6 [having the sequence 

25 5 ■ CCCAGACCCGCCCCCAGCATTGTGGGTTCCAACGCCCTCTAGACA [MNNJ «ACAATG 
TATATCTCCTTCTT3 1 (SEQ ID NO:3) ; M « A or C , N = G, A, T, or 
C] , and extending the DNA in a reaction mixture containing 
10-100 units of Sequenase (United States Biochemical Corp., 
Cleveland, OH), all four dNTPS (at 1 mM) , and 10 mM 

3Q dithiothreitol for 30 min at 37°C. The extended material is 
then digested with BstXI, ethanol precipitated and 
resuspended in water. This fragment of DNA is then ligated 
via the BstXI end to a 250 base pair (bp) , PCR-amplif ied 
Glycine -Serine coding fragment derived from gene III of M13 

35 bacteriophage DNA. The gene III fragment has been amplified 
by use of two primers, respectively termed FGSPCR [having the 
sequence 5 ' TCGTCTGACCTGCCTCAACCTCCCCACAATGCTGGCGGCGGCTCTGGT3 ' 
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(SEQ ID NO: 4)], and RGSPCR [having the sequence 
5 9 ATC7VAGTTTGCCTTTACCAGCATTGTGGAGCGCGTTTTCATC3 ' 
{SEQ ID NO : 5 ) ] , and Taq DNA polymerase (Gibco-BRL) . The 
amplified DNA {250 bp) was cut with BstXI to yield a 200 bp 
5 fragment that has been gel purified. The 200 bp fragment is 
then ligated to the random peptide coding DNA fragment. This 
DNA specifies the synthesis of a peptide of the sequence Met- 
His-Cys- (X) 6 -Cys- (SEQ ID NO:6) fused to the Gly-Ser rich 
region of the M13 gene III protein. The Gly-Ser rich domain 

10 is thought to behave as a flexible linker and assist in 
presentation of the random peptide to the target molecules. 

To make constrained random peptides of different 
lengths, oligonucleotides are made that are similar to MNN6 , 
except that the degenerate region is 5, 7, 8, and 9 codons 

15 long. In addition, oligonucleotides are made that code for 
various shapes of constrained random peptides by specifying 
sequences comprising three cysteine residues interspersed 
between 6-10 randomly specified amino acids. 

20 6.3.2. IN VITRO SYNTHESIS AND 

ISOLATION OF POLYSOMES 

An E. coli S30 extract is prepared from the B strain 

SL119 (Promega) . Coupled transcription- translation reactions 

are performed by mixing the S30 extract with the S30 premix 

25 (containing all 20 amino acids) , the linear DNA template 
coding for peptides of random sequences (prepared as 
described in Section 6.3.1 above) , and rifampicin at 20 
fig/ml. The reaction is initiated by the addition of 100 
units of T7 RNA polymerase and continues at 37°C for 30 min. 

30 The reaction is terminated by placing the reactions on ice 
and diluting them 4-fold with polysome buffer (20 mM Hepes- 
NaOH, pH 7.5, 10 mM MgCl 2 , 1.5 ng/ml chloramphenicol, 100 
jig/ml acetylated bovine serum albumin, 1 mM dithiothreitol , 
20 units/ml RNasin, and 0.1% Triton X-100) . Polysomes are 

35 isolated from a 50 ftl reaction programmed with 0.5-1 jig of 
linear DNA template specifying the synthesis of random 
constrained peptides. To isolate polysomes, the diluted S30 
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10 



X5 



20 



25 



30 



35 



reaction mixtures are centrifuged at 288,000 X g for 30-40 
min at 4°C The pellets are suspended in polysome buffer and 
centrifuged a second time at 10,000 X g for 5 min to remove 
insoluble material. 

6.3.3. AFFINITY SELECTION /SCREENING OF POLYSOMES 
The isolated polysomes are incubated in microtiter wells 
coated with the target proteins. Microtiter wells are 
uniformly coated with 1-5 /xg of 6 -His tagged, or glutathione 
S-transf erase fused, target proteins (see Section €.2 
hereinabove) . Target proteins that are used include the 
oncoproteins ras and raf , KDR (the vascular endothelial 
growth factor [vEGF] receptor protein) and vEGF. The 
microtiter wells are coated with 1*5 /*g of these target 
proteins by incubation in PBS (phosphate-buffered saline; 10 
mM sodium phosphate, pH 7.4, 14 0 mM NaCl, 2.7 mM KC1) , for 1- 
5 hours at 37°C. The wells are then washed with PBS, and the 
unbound surfaces of the wells blocked by incubation with PBS 
containing 1% nonfat milk for 1 hr at 37°C. Following a wash 
with polysome buffer, each well is incubated with polysomes 
isolated from a single 50 jxl reaction for 2-24 hr at 4°C. 
Each well is washed five times with polysome buffer and the 
associated mRNA is eluted with polysome buffer containing 20 
mM EDTA. 

After affinity selection of the polysomes, the 
associated mRNAs are isolated, and treated with 5-10 units of 
DNase I (RNase-free; Ambion) for 15 min at 37°C after 
addition of MgCl 2 to 40 mM. The mRNA is phenol -extracted and 
ethanol -precipitated and dissolved in 20 fil of RNase-free 
water. A portion of the mRNA is used for cDNA preparation 
and subsequent amplification using 15 pmol each of primers 
RGSPCR [ 5 1 ATCAAGTTTGCCnTTACC^GCATTGTGGAGCGCGTTTTCATC3 1 
(SEQ ID NO:5)], and SELEXF1 

[ 5 1 ACTTCGAAATTAATACGACTCACTATAGGGAGACCACAACGGTTTCC3 ' 
(SEQ ID NO: 9)1 and rTth Reverse Transcriptase RNA PCR kit 
(Perkin Elmer Cetus) . Specifically, the mRNA is reverse- 
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transcribed into cDNA in a 20 //I reaction containing 1 pg 
mRNA, 15 ptnol of RGSPCR primer, 200 /zM each of dGTP, dATP, 
dTTP, and dCTP, 1 mM MnCl 2/ 10 mM Tris-HCI, pH 8-3, 90 mM KC1, 
and 5 units of rTth DNA polymerase at 70 °C for 15 min. In 
5 the next step, the cDNA is amplified by the addition of 2.5 
mM MgCl 2 , 8% glycerol, 80 mM Tris-HCI, pH 8.3, 125* mM KCl, 
0.95 mM EGTA, 0.6% Tween 20, and 15 pmol of the SELEXF1 
primer. The reaction conditions that are employed are 2 min 
at 95°C for one cycle, 1 min at 95°C and 1 min at 60°C for 35 

10 cycles, and 7 min at 60°C for one cycle. The amplified 
product is then gel -purified and guantitated by 
spectrophotometry at 260 nm. A portion of the amplified DNA 
is digested with Nsil and Xbal and the resulting 30 base pair 
fragment is directionally cloned into a monovalent phage 

15 display vector. The DNAs inserted in the monovalent phage 
display vector are then sequenced to determine the identity 
of the peptides that were selectively retained by one cycle 
of affinity binding to the target protein. A second portion 
(0.5-1 jig) of the amplified DNA is subjected to another cycle 

20 of affinity selection, mRNA isolation, cDNA amplification, 
and cloning. 

6.4. PHAGEMID SCREENING 
Three different protocols for screening of a phagemid 
25 library are presented in the subsections hereinbelow. These 
protocols, particularly the immobilization and binding steps, 
are readily adaptable to use for screening of different 
libraries, e.g., polysome libraries. Preferably, different 
methods are used in different rounds of screening. 

30 

6.4.1. PMVTF yUPTPCOfr 

In this example, a protocol is presented for screening a 
phagemid library, in which in the first round of screening, a 
35 biotinylated target protein is immobilized (by the specific 
binding between biotin and streptavidin) on a streptavidin 
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coated plate. The immobilized target protein is then 
contacted with library members to select binders. 

Reagents Used: 

5 Purified target protein, microfuge tubes, Falcon 2059, 
Binding Buffer, Wash Buffer, Elute Buffer, phage display 
Library of >10" pfu/Screened Target, fresh overnight cultures 
of appropriate host cells, LB Agar plates with antibiotics as 
needed, biotinylating agent NHS-LC-Biotin (Pierce Cat. 
10 #21335), streptavidin, 50 mM NaHCO, pH 8.5, 1 M Tris pH 9.1, 
M280 Sheep anti-mouse IgG coated Dynabeads (Dynal) , phosphate 
buffered saline (PBS), Falcon 1008 petri dishes. 

Wash Buffer = IX PBS (Sigma Tablets), 1 mM MgCl 2 , 1 mM CaCl 2 , 
15 0.05% Tween 20; (For one liter: 5 PBS tablets, 1 ml 1 M MgCl 2 , 
1 ml 1 M CaCl 2 , 0.5ml Tween 20, nanopure K 2 0 to 1 liter). 

Binding Buffer = Wash Buffer with 5 mg/ml bovine serum 
albumin (BSA) . 

20 

Elute Buffer = 0.1 N HC1 adjusted to pH 2.2 with glycine: 
1 mg/ml BSA. 

Procedure: 
25 Protein Biotinylation: 

1. Wash 50-100 jig of target protein in 50 mM NaHC0 3 pH 8.5 
in a Centricon (Amicon) of the appropriate molecular weight 
cut-off. 

2. Bring the total volume to 100 /xl with 50 mM NaHC0 3 pH 
30 8,5. 

3. Dissolve 1 mg of NHS-LC-Biotin in 1 ml H 2 0. Do not store 
this solution. 

4. Immediately add 37 /il of the NHS-LC-Biotin solution to 
the target protein and incubate for 1 hr at room temperature 

35 (RT) . 
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5. Remove the unreacted biotin by washing 2X PBS in a 
Centricon (Amicon) of the appropriate molecular weight 
cutoff. Store the biotinylated protein at 4°C. 

5 Coating a 1008 Plate with Streptavidin: 

6. The night before the binding experiment precoat a 1008 
plate with streptavidin. 

7. Add 10 pg of streptavidin (1 mg/ml H 2 0) per 1 ml of 50 mM 
NaHC0 3 pH B.5. 

10 8. Add 1 ml of this solution to each plate and place in a 
humidified chamber overnight at 4°C. 

Prebinding; Blocking Non-Specific Sites: 

9. To a streptavidin coated plate add 400 pi of Binding 
15 Buffer (BSA blocking) for one hour at room temperature. 

10 . Rinse wells six times with Wash Buffer by slapping dry 
on a clean piece of labmat. 

Binding; Specific Target/Phage Complexes Round 1: 
20 11. Add 10 pg of biotinylated target protein in 400 pi of 
Binding Buffer to the well and incubate for 2 hr at 4°C. 

12. Add 4 pi of 10 mM biotin and swirl for 1 hr at 4°C. 

13. Wash as in step 10. 

14. Add concentrated phage library (>10 X1 pfu) in 400 pi of 
25 Binding Buffer and swirl overnight at 4°C. 

Washing and Elution: 

15. Slap out binding mixture and wash as in step 10. 

16. To elute bound phage add 400 pi of Elution Buffer and 
30 rock at RT for 15 min. 

17. Transfer the elution solution to a sterile 1.5 ml tube 
which contains 75 pi of 1 M Tris pH 9.1. Vortex briefly. 

Amplification of Round 1 Eluted Phage: 

35 IB. Plate all of the eluted round 1 phage by adding 157 pi 
of phage to 200 pi of cells incubated overnight (previously 
checked free of contamination) in three aliquots. Incubate 
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25 min in a 37 °C water bath and then spread onto LB 
agar/antibiotics plate containing 2% glucose. 
19. Scrape plates with 5 ml of 2XYT (growth broth)/ 
Antibiotics/Glucose and leave swirling for 30 min at RT. 
5 20. Add the appropriate amount of 2XYT/Antibiotics/Glucose 
to bring the O.D. 600 down to 0.4 and then grow at 37 °C at 
250 rpm until the O.D. 600 reaches O.B. 

21. Remove 5 ml and add to it 1.25 x 10 10 M13 helper phage. 

22. Shake 30 min at 150 rpm and then 30 min at 250 rpm at 
10 37°C. 

23. Centrifuge 10 min at 3000 X g at RT. 

24. Resuspend cells in 5 ml 2XYT with no glucose. (This step 
removes glucose) . 

25. Centrifuge as in step 23 and resuspend in 5 ml 2XYT with 
15 kanamycin and the appropriate antibiotics (no glucose) . Spin 

18 hr at 37°C and 250 rpm. 

26. Pellet cells at 10,000 X g and sterile filter the phage 
containing supernatant which is now ready for round 2 
screening. 

20 27. Titer the round 1 eluted phage stocks. 

Binding; Specific Target/Phage Complexes Rounds 2-5: 

6. Combine ~1 jzg of biotinylated target protein with the 
eluted and titered round 1 phage (10 9 pfu) in 200 /xl of 

25 Binding Buffer and rock 4 hr at 4°C. 

7. The night before the round 2 screening is started, 
prewash 200 /il/target protein to be screened of sheep anti- 
mouse IgG magnetic beads (M280 IgG Dynabeads) with 2X1 ml of 
Wash Buffer using the Dynal Magnet. Let the beads collect at 

30 least 1 min before removing the buffer. Let the beads stand 
15 sec to allow residual Binding Buffer to collect and remove 
with a P200 Pipetman. 

8. Resuspend the washed beads in 200 fil of Binding Buffer 
and add 100 fil of mouse anti-biotin IgG, (Jackson IRL) . Rock 

35 overnight at 4°C. 

10. Wash the unbound anti-biotin IgG from the Dynabeads by 
placing them on the Dyna magnet for at least 1 min and remove 
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all liquid as in Step 7. Remove the tube from the magnet and 
resuspend the beads in 1 ml of Wash Buffer, rock at 4°C for 
30 min, and return to the magnet. Again let the beads pellet 
for 1 min; repeat this process 3 more times and resuspend the 
5 beads in 400 pi of Binding Buffer. 

10a. The coated beads are now ready for use 

(100 /xl/round/target protein) . The remainder can be stored 

for use for up to 2 weeks. 

11. Add the 100 fil of anti-biotin coated Dynabeads (Step 10) 
10 to the protein/phage fraction (Step 9) bringing the total 

binding volume to 300 pi and rock for 2 hr at 4°C. Ensure 
that the beads mix thoroughly with the phage/protein 
solution. 

15 Washing and Elution: 

12. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

13. Remove the solution using a P1000 Pipetman and discard. 
Let the beads stand 15 sec to allow residual binding buffer 

20 to collect and remove with a P200 Pipetman. Note serial 
dilution depends upon all residual liquid being removed 
(i.e., 5 fil into 5C0 is 100X washing; 50 pi into 500 is only 
10X) . 

14. Remove the tube from the magnet and resuspend the beads 
25 in 750 pi of Wash Buffer and return to the magnet. Again let 

the beads pellet by waiting 1 min. 

15. Remove the Wash solution as in Step 7 and repeat this 
process several more times. 

16 . After the removal of the final wash, resuspend the beads 
30 and transfer them to a fresh, labeled tube and wash once 

more. 

17. To elute bound phage, add 400 pi of Elution Buffer, 
titrate and rock for 14 min at RT. 

18. Place the tube on the magnet for one minute and transfer 
35 the eluate to a sterile 1.5 ml tube which contains 75 pi of 

1 M Tris pH 9.1. Vortex briefly. 
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Amplification of Round 2*5 Eluted Phage: 

15a. Plate 10 /il and 100 pi of round 2,3,4 eluates using 
200 jxl of contamination free (previously tested) E. coli 
XLlBlue cells onto each plate containing 
5 tetracycline/ampicillin/glucose and tetracycline/ampicillin 
and amplify as in Steps 17-25. 

6.4.2. BIOTIN-ANTIBIOTIN IgG BEAD PROTOCOL 
In this example, a protocol is presented for screening a 
10 phagemid library, in which a biotinylated target protein is 
immobilized (by the specific binding between anti-biotin 
antibodies and biotin) on a magnetic bead containing anti- 
biotin antibodies on the bead surface. The immobilized 
target protein is then contacted with library members to 
15 select binders. 

Reagents Used: 

M280 Sheep anti-Mouse IgG coated Dynabeads (Dynal) 

20 Binding; Specific Target/Phage Complexes Round 1: 

6. Combine 10 /*g of biotinylated target protein with the 
phage library (>10 1C pfu) in 400 fil of Binding Buffer and rock 
overnight at 4°C. 

7. That same night prewash 50 §il sheep anti -mouse IgG 

25 magnetic beads (M280 IgG Dynabeads) with 500 jxl of Binding 
Buffer twice using the Dynal Magnet. Let the beads collect 
at least 1 min before removing the buffer. Let the beads 
stand 15 sec to allow residual binding buffer to collect and 
remove with a P200 Pipetman. 

30 8. Resuspend the washed beads in 100 //l of Binding Buffer 
and add 33 /zl of mouse anti-biotin IgG (40 fig, Jackson IRL) . 
Rock overnight at 4°C. 

9. Remove unbound protein from the phage/protein reaction 
in Step 6 with a Microcon 100. Spin at 800 X g until 
35 exclusion volume is met and wash twice with Wash Buffer 

(again at 800 X g) . Collect phage/protein with a Pipetman and 
add an additional 50 pi of Wash Buffer to the Microcon, 
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gently titrate and combine with first fraction to ensure 
maximal recovery. 

10. Wash the unbound anti-biotin IgG from the Dynabeads by 
placing them on the Dyna magnet for at least 1 min and remove 
5 all liquid as in Step 7. Remove the tube from the magnet and 
resuspend the beads in 750 m1 of Wash Buffer, rock at 4°C for 
30 min, and return to the magnet. Again, let the beads 
pellet for 1 min; repeat this process 3 more times and 
resuspend the beads in 100 /il of Binding Buffer. 
10 11. Add the anti-biotin coated Dynabeads (Step 10) to the 
protein/phage fraction (Step 9) , bring the total binding 
volume to 500 jil with Binding Buffer, and rock for 2 hr at 
RT. Ensure that the beads mix thoroughly with the 
phage/protein solution. 

15 

Washing and Elution: 

12. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

13. Remove the solution using a P1000 Pipetman and discard. 
20 Let the beads stand 15 sec to allow residual binding buffer 

to collect and remove with a P200 Pipetman. Note that serial 
dilution depends upon all residual liquid being removed 
(i.e., 5 /il into 500 is 100X washing; 50 fil into 500 is only 
10X) . 

25 14 . Remove the tube from the magnet and resuspend the beads 
in 750 jil of Wash Buffer and return to the magnet. Again let 
the beads pellet by waiting 1 min. 

15. Remove the wash solution as in Step 7 and repeat this 
process 3 more times. 
30 16. After the removal of the fourth wash, resuspend the 
beads and transfer them to a fresh, labeled tube and wash 
once more. 

17. To elute bound phage, add 400 /il of Elution Buffer, 
titrate and rock for 14 min at RT. 
35 IB. Place the tube on the magnet for one minute and transfer 
the eluate to a sterile 1.5 ml tube which contains 75 jxl of 
1 M Tris pH 9.1. Vortex briefly. 
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Amplification of Round 1 Eluted Phage: 

17. Plate all of the eluted round 1 phage by adding 157 jil 
of phage to 200 ml of cells incubated overnight (previously 
checked to be free of contamination) in three aliquots. 
5 Incubate 25 min in a 37 °C water bath and then spread onto L.B 
agar/antibiotics plate containing 2% glucose. Place plates 
upright in 37 °C incubator until dry and then invert and 
incubate overnight. 

IB. Scrape plates with 5 ml of 2XYT/Antibiotics/Glucose and 
10 leave swirling for 30 min at RT. 

19. Add the appropriate amount of 2XYT/Antibiotics/Glucose 
to bring the O.D. 600 down to 0.4 and then grow at 37°C at 
250 rpm until the O.D. 600 reaches 0.8. 

20. Remove 5 ml and add to it 1.25 x 10 10 M13 helper phage. 
15 21. Shake 30 min at 150 rpm and then 30 min at 250 rpm at 

37°C. 

22. Centrifuge 10 min at 3000 X g at RT. 

23. Resuspend cells in 5 ml 2XYT with no glucose. (This step 
removes glucose) 

20 24 . Centrifuge as in step 23 and resuspend in 5 ml 2XYT with 
kanamycin and the appropriate antibiotics (no glucose) . Spin 
18 hr at 37°C and 250 rpm. 

25. Pellet cells at 10,000 xg and sterile filter the phage- 
containing supernatant which is now ready for round 2 
25 screening. 

Binding; Specific Target/Phage Complexes Round 2, 3, 6 4: 

6a. Bind 1 fig of target protein with 100 fil of amplified 
phage from the previous round as before, overnight at 4°C. 
30 7a. Prepare the IgG anti biotin/anti IgG beads as in Steps 
7*10 using, however, only 20 pi of sheep anti -mouse IgG and 
13 /il of anti-biotin IgG. 

8a. All other binding procedures are identical with Steps 6- 
11. 
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Washing and Elution: 

9a. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

10a. Remove the solution and discard using a P1000 Pipetman. 
5 Let the beads stand 30 sec to allow residual Binding Buffer 
to collect and remove with a P200 Pipetman, 

11a. Remove the tube from the magnet and resuspend the beads 
in 750 /il of Wash Buffer and return to the magnet. Again let 
the beads pellet by waiting 1 min. 
10 12a. Remove the wash solution as in Step 11a and repeat this 
process 3 more times. 

13a. After the removal of the fourth wash, resuspend the 
beads and transfer them to a fresh, labeled tube and wash 4 
more times . 
15 14a. Elute and neutralize as in Step 15. 

Amplification of Rounds 2, 3, & 4 Eluted Phage: 

15a. Plate 10 pi and 100 fil of round 2,3,4 eluates and 
amplify as in Steps 17-25. 

20 

6.4.3. B I OTIN - S TREPTAVID IN , MAGNETIC 
BEAD PROTOCOLS 

In this example, a protocol is presented for screening a 

phagemid library, in which a biotinylated target protein is 

25 immobilized (by the specific binding between biotin and 

streptavidin) on a streptavidin coated magnetic bead. The 

immobilized target protein is then contacted with library 

members to select binders. 

30 Reagents Used: 

Purified target protein, M280 streptavidin coated Dynabeads 
(Dynal) 

Binding; Specific Target/Phage Complexes Round 1: 

35 6. Combine 10 /ig of biotinylated target protein with the 
phage library (>10 10 pfu) in 400 jil of Binding Buffer and rock 
overnight at 4°C. 
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7. Remove unbound protein with a Microcon 100 . Spin at 
800 X g until exclusion volume is met, and wash twice with 
Wash Buffer (again at 800 X g) . Collect phage/protein with a 
Pipetman and add an addition 50 /il of Wash Buffer to the 

5 Microcon, gently titrate and combine with the first fraction 
to ensure maximal recovery. 

8. Prewash 50 fil (per reaction) of streptavidin magnetic 
beads (M280 streptavidin Dynabeads) twice with 500 /il of 
Washing Buffer using the Dynal magnet. 

10 9- Add the prewashed Dynabeads to the protein/phage fraction 
(add Binding Buffer to a total of 500 /il) and rock for 30 min. 
Ensure that the beads mix thoroughly with the phage/protein 
solution. 

15 Washing and Elution : 

10. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

11. Remove the solution using a P1000 Pipetman and discard. 
Let the beads stand 15 sec to allow residual Binding Buffer to 

20 collect and remove with a P200 Pipetman. Note that serial 

dilution depends upon all residual liquid being removed {i.e., 
5 pi into 500 is 100X washing; 50 /xl into 500 is only 10X) . 

12 . Remove the tube from the magnet and resuspend the beads 
in 750 pi of Wash Buffer and return to the magnet. Again let 

25 the beads pellet by waiting 1 min. 

13 . Remove the wash solution as in step 11 and repeat this 
process 3 more times. 

14 . After the removal of the fourth wash, resuspend the beads 
and transfer them to a fresh, labeled tube and wash once more. 
30 15. To elute bound phage add 400 pi of Elution Buffer, 
titrate and rock for 14 min at RT. 

16. Place the tube on the magnet for one minute and transfer 
the eluate to a sterile 1.5 ml tube which contains 75 fil of 
1 M Tris pH 9.1. Vortex briefly. 
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Amplification of Round 1 Eluted Phage: 

17. Plate all of the eluted round 1 phage by adding 157 pi of 
phage to 200 pi of overnight cells (previously checked to be 
•free of contamination) in three aliquots. Incubate 25 min in 

5 a 37°C water bath and then spread onto LB agar/antibiotics 
plate containing 2% glucose. Place plates upright 'in 37°C 
incubator until dry and then invert and incubate overnight. 

18. Scrape plates with 5 pi of 2XYT/Antibiotics /Glucose and 
leave swirling for 30 min at RT. 

10 19. Add the appropriate amount of 2XYT/Antibiotics/Glucose 
to bring the O.D. 600 down to 0.4 and then grow at 37°C at 250 
rpm until the O.D. 600 reaches 0.8. 

20. Remove 5 ml and add to it 1.25 x 10 10 M13 helper phage. 

21. Shake 30 min at 150 rpm and then 30 min at 250 rpm at 
15 37°C. 

22. Centrifuge 10 min at 3000 X g at RT. 

23. Resuspend cells in 5 pi 2XYT with no glucose. (This step 
removes glucose) . 

24. Centrifuge as in step 22 and resuspend in 5 ml 2XYT with 
20 kanamycin and the appropriate antibiotics (no glucose) . Shake 

18 hr at 37°C and 250 rpm. 

25. Pellet cells at 10,000 X g and sterile filter the phage 
containing supernatant which is now ready for round 2 
screening . 

25 

Binding; Specific Target/Phage Complexes Round 2, 3, & 4: 

6a. Combine 1 pg of biotinylated target protein with 100 pi 
of the previous round's phage (>10 9 pfu) in 400 pi of Binding 
Buffer and rock overnight at 4°C. 

30 7a. Remove unbound protein with a Microcon 100. Spin at 

800 X g until exclusion volume is met and wash twice with Wash 
Buffer (again at 800 X g) . Collect phage/protein with a 
Pipetman and add an addition 50 pi of Wash Buffer to the 
Microcon, gently titrate and combine with the first fraction 

35 to ensure maximal recovery. 
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8a. Prewash 20 m 1 (per reaction) of streptavidin magnetic 
beads (M280 streptavidin Dynabeads) twice with 500 /xl of 
Washing Buffer using the Dynal magnet. 

9a. Add the prewashed Dynabeads to the protein/phage fraction 
5 and rock for 30 min. Add Binding Buffer to a total of 500 /il . 
Ensure that the beads mix thoroughly with the phage/protein 
solution. 

Washing and Elution: 
10 10a. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

11a. Remove the solution and discard using a P1000 Pipetman. 
Let the beads stand 30 sec to allow residual Binding Buffer to 
collect and remove with a P200 Pipetman. 
15 12a. Remove the tube from the magnet and resuspend the beads 
in 750 jaI of Wash Buffer and return to the magnet. Again let 
the beads pellet by waiting 1 min. 

13a. Remove the wash solution as in Step 11a and repeat this 
process 3 more times. 
20 14a. Aft^r the removal of the fourth wash resuspend the beads 
and transfer them to a fresh, labeled tube and wash 4 more 
times. 

15a. Elute and neutralize as in Step 15- 

25 Amplification of Rounds 2, 3, & 4 Eluted Phage: 

16a. Plate 10 ^1 and 100 /il of round 2,3,4 eluates and amplify 
as in Steps 17-25. 

6.5. AFFINITY MEASUREMENTS OF 
30 PEPTIDE - TARGET PROTEIN INTERACTIONS 

Once peptides that bind to a target protein have been 

identified, the affinities of these peptides to their 

respective targets are measured by measuring the dissociation 

constants (Kj) of each of these peptides to their respective 

35 targets. Oligonucleotides that encode the peptides are 

constructed so as to encode also an epitope tag fused to the 

peptide (for example, the myc epitope) that can be detected by 
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a commercially available antibody. These oligonucleotides are 
incubated with polysome extracts to produce the peptide tagged 
with the epitope. Binding of the target protein to the 
peptide is done in solution, and separation of the bound 
5 peptide from the unbound peptide is done by immunoaf f inity 
purification using an ant i- target protein antibody. This 
immunoaf f inity purification is done by a modified EL ISA 
(enzyme- linked immunosorbent assay) protocol, in which the 
target protein-peptide mixture is exposed to the anti-target 

10 protein antibody immobilized on a solid support such as a 
nitrocellulose membrane, and the unbound peptide is then 
washed off. In this protocol, the concentration of the target 
protein is varied and then the amount of bound peptide is 
estimated by detecting the epitope tag on the peptide by use 

15 of anti -epitope antibody. In this manner, the affinity of 
each peptide for its target protein can be determined. 

6.6. REDOR MEASUREMENTS ON A CX.C PEPTIDE RESIN 

This example demonstrates successful synthesis and 
20 cyclization of a CX 6 C peptide resin of greater than 95% purity 
and with a labeled glycine followed by successful REDOR 
distance measurements on the CX 6 C peptide resin using the 
preferred REDOR methods of this invention. The labeled 
peptide used was 

25 Cys-Asn-Thr-Leu-Lys- ( 15 N-2- 13 C) Gly-Asp-Cys-Gly-mBHA resin, where 
a glycine linker attached the peptide of interest to the nBHA 
resin. ( Cys - Asn - Thr - Leu - Lys - Gly- Asp- Cys -'Gly « SEQ ID NO: 10) 

The peptide resin was synthesized by solid phase 
synthesis on p-MethylBenzhydrilamine (mBHA) resin using a 

30 combination of Boc and Fmoc chemistry. MethylBenzhydrilamine 
resin (Subst. 0.36 meg/g) was purchased from Advanced Chem 
Tech (Louisville, KY) . Fmoc ( x5 N-2-- 1, C) Gly was prepared from 
HC1, ( ls N-2-"C)Gly (Isotec Inc., Miandsburg, OH) and Fmoc-OSu. 
Boc-Gly, (Trt) , Fmoc * Asp (OtBu) f Fmoc -Lys (Boc) , Fmoc-Leu, 

35 Fmoc-Thr(OtBu) , Fmoc-Asn and Boc-Cys(Acm) were purchased from 
Bachem (Torrance, CA) . Reagent grade solvents were purchased 
from Fisher Scientific, Diisopropylcarbodiimide (DIC) , 
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Trifluoroacetic acid (TFA) and Diisopropylethylamine (DIEA) 
were purchased from Chem Impex (Wooddale, ID . Nitrogen, HF 
were purchased from Air Products (San Diego, CA) . 
The first step 43 was the synthesis of 
5 Boc-Cys (ACM) -Asn-Thr (OtBu) -Leu-Lys (Boc) -Gly-Asp (OtBu) - 
Cys(Trt)-Gly-mBHA resin. l.llg (0,40 meq) of mBHA resin were 
placed in a 150 ml reaction vessel (glass filter at the 
bottom) .with Methylene Chloride (CH 2 C1 2 ) [ M DCM n ] and stirred 15 
min with a gentle bubbling of Nitrogen in order to swell the 
10 resin. The solvent was drained and the resin was neutralized 
with DIEA 5% in DCM (3X2 min) . After washes with DCM, the 
resin was coupled 60 min with Boc-Gly (0,280 g-1.6 meq-4 fold 
excess- 0.1M) and DIC (0.25 ml -1.6 meq-4 fold excess- 0.1M) in 
DCM. Completion of the coupling was checked with the 
15 Ninhydrin test . After washes, the resin was stirred 30 min in 
TFA 55% in DCM in order to remove the Boc protecting group. 
The resin was then neutralized with DIEA 5% in DCM and coupled 
with Fmoc-Cys (Trt) (0.937g-1.6 meq-4 fold excess-O.lM) and DIC 
(0.25 ml-1.6 meq-4 fold excess-O.lM) in DCM/DMF (50/50). 
20 After washes the resin was stirred with Piperidine 20% in DMF 
(5 min and 20 min) in order to remove the Fmoc group. After 
washes, this same cycle was repeated with Fmoc -Asp (OtBu) , 
Fmoc( l5 N-2-"C)Gly (2 fold excess only), Fmoc -Lys (Boc) , Fmoc- 
Leu, Fmoc-Thr (OtBu) , Fmoc-Asn and Boc-Cys (Acm) . After the 
25 last coupling/ the Boc group was left on the peptide. The 
resin was washed thoroughly with DCM and dried under a 
nitrogen stream. Yield was 1.49g (Expected: -l.7g). 

The next step 44 was cyclization of the 
Boc-Cys-Asn-Thr (OtBu) -Leu -Lys (Boc) -Gly-Asp (OtBu) -Cys-Gly-mBHA 
30 resin. 600 mg of protected peptide resin were sealed in a 
polypropylene mesh packet. The bag was shaken in a mixture of 
solvent (DCM/Methanol/Water-640/280/47) in order to swell the 
resin. The bag was then shaken 20 min in 100 ml of a solution 
of iodine in the same mixture of solvent (0.4 mg I 2 /ml solvent 
35 mixture) . This operation was performed 4 times. No 

decoloration was observed after the third time. The resin was 
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then thoroughly washed with DCM, DMF, DCM, and methanol 
successively. 

The last step 4 5 was side -chain deprotection of the 
C^s-Asn-Thr-Leu-Lys-Gly-Asp-Cys-Gly-mBHA resin. After 
5 cyclization the resin in the polypropylene bag was reacted 1.5 
hour with 100 ml of a mixture TFA/p-Cresol -Water (95/2.5/2.5) . 
After washes with DCM and Methanol, the resin was dried 48 
hours under vacuum. Yield was 560 mg. 

The resulting peptide resin was analyzed for its purity 
10 and the presence of the disulfide bridge. 40 mg of resin were 
sealed in a propylene mesh packet and treated with HF at 0 C 
for 1 hour in presence of anisole {HF/Anisole: 90/10). The 
scavenger and by-products were extracted from the resin with 
cold ethyl ether. The peptide was extracted with 10% Acetic 
15 Acid and lyophilized 36 hours. The dry isolated peptide was 
characterized by PDMS (mass spectrography) and HPLC (high 
performance liquid chromatography) . This analysis 
demonstrated that greater than 95% of the product peptide was 
of the correct amino acid composition, having a disulfide loop 
20 and without inter-molecular disulfide dimers. 

REDOR measurements were made on the peptide resin 
prepared by this method, and as a control, also on dried 
( ls N-2- 13 C) labeled glycine. The preferred REDOR methods and 
parameters, as previously detailed, were used. Fig. 6 
25 illustrates the a5 N resonance spectral signals obtained. 
Signal 70 is the signal produced by dried glycine after no 
rotor periods. Signals 71, 72, 73 are glycine signals after 
2, 4, and 8 rotor periods, respectively. Signals 74, 75, 76, 
and 77 are the peptide resin signals after 0, 2, 4/ and 8 
30 rotor periods, respectively. 

Fig. 7 illustrates the data analysis. As in Fig. 5, axis 
81 is the AS/S axis, and axis 82 is the X axis. The variables 
are as used in equation 5. Graph 83 is defined by equation 5, 
and is the initial rising part of the full curve shown in Fig. 
35 5. Data points 84, 85, 86, and 87 are best fits of the data 
for 0, 2, 4, and 8 rotor periods, respectively. At these 
points, the circles represent the glycine values and the 
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squares the peptide resin values. These values correspond to 
a C-N distance in glycine and the peptide of 1.55 A (and a D^,- 
of 800 Hz) . Repeated measurements gave a C-N distance of 
1.50 A (and a of 875 Hz). The accepted distance in glycine 
5 is 1.48 A. The above procedure was repeated for ( 15 N-1- 13 C) 
labeled glycine in 

Cys-Asn-Thr-Leu-Lys- ( 15 N-1 - 13 C) Gly-Asp-Cys-Gly-mBHA resin, and 
the measured C-N distance of 2.50 A is in excellent agreement 
with the predicted value of 2.46 A. 

10 Thus REDOR accuracy to better that 0,1 A is demonstrated. 

Also demonstrated is the peptide resin as an appropriate 
substrate for NMR measurements. Inter-molecular dipole-dipole 
interactions between adjacent peptides did not interfere. 
Also the overlap of the distances measured in free glycine and 

15 in glycine incorporated in the peptide demonstrated that the 
peptide was held sufficiently rigidly by the resin that any 
remaining peptide motions did not interfere with the NMR 
measurements. 

20 7. SPECIFIC EMBODIMENTS, CITATION OF REFERENCES 

The present invention is not to be limited in scope by 
the specific embodiments described herein. Indeed, various 
modifications of the invention in addition to those described 
herein will become apparent to those skilled in the art from 

25 the foregoing description and accompanying figures. Such 
modifications are intended to fall within the scope of the 
appended claims. 

Various publications are cited herein, the disclosures of 
which are incorporated by reference in their entireties. 

30 



35 
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8. COMPUTE* PRQCTAM LISTCTSS 

These computer program listings are copyright 1995 of 
CuraGen, Inc. ° 1995 CuraGen, Inc. 



START OF LISTING 
************************************ ^ 



C CODE ROUTINES 



MAKEFILE AND GO PROC 
********★**+**********************★***** 

MAKEFILE: 

OPTIONS = ~mips2 -ansi -g -fullwarn -00 

peptide. ex: random. o peptide. o peptidel.o peptide2.o peptide3.o 
peptide4,o \ 

peptides . o peptide6 . o peptide7 . o 

cc $ (OPTIONS) random. o peptide* .o -lm -o peptide. ex 
random . o : random . c 

cc $ (OPTIONS) -c random, c 
peptide. o: peptide. c *.h 

cc $ (OPTIONS) -c peptide. c 
peptidel.o: peptidel.c *.h 

cc $ (OPTIONS) -c peptidel.c 
peptide2.o: peptide2.c *.h 

cc $ (OPTIONS) -c peptide2.c 
peptide3.o: peptide3.c *.h 

cc $ (OPTIONS) -c peptide3.c 
peptide4.o: peptide4.c *.h 

cc $ (OPTIONS) -c peptide4.c 
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peptides. o: peptide5.c *.h 

cc $ (OPTIONS) -c peptides. c 
peptide6.o: peptide6.c *.h 

cc $ (OPTIONS) -c peptide6.c 
peptide7 . o : peptide7 . c * . h 

cc $ (OPTIONS) -c peptide7.c 

GO PROC: 

peptide. ex « EOF 

0.1 

1 

CGGGGGGC 
EOF 



MAIN PROGRAM - PEPTIDE. C 

************************* 

#define MAIN 

ftinclude "peptide. h" 

/* The main program stub */ 

void main(int argc, char *argv[] , char *envp[J) 

{ 

logical *cyclic; 

int n_j>eptides, max_ atoms_per_unit ; 

int *n_amino_acids , *n_atoms_total, +n_side, *n_main; 

rigid_unit **peptide; 

torsion_list ** torsion; 

hbond_list **hbond; 

atomJList "atom, **atom2; 

atom_info **atom_tinp; 

vector *twig[KMAXj; 

int ***bond_table; 

string * sequence; 

int i, j ; 

int list_num, max_atoms_total ; 
double seed; 

regrowth **main, **side; 

printf ("Enter random number seed "); 
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scanf ( "%lf n , fcseed) ; 
ran2(seed); 
/* get linear sequences */ 

get_seguence (^sequence, &n_peptides) ; 
print f ("\n") ; 
/* allocate memory for arrays */ 
if ((peptide «= (rigid_unit **) 

malloc (n_peptides*sizeof (rigid_unit *))) NULL) 
out_ofjnemory ( ) ; 
if {(torsion = (torsion_list **) 

malloc (n_peptides*sizeof (torsion_list *))) ■» NULL) 
out_of_memory ( ) ; 

if ((hbond = (hbond_list **) malloc {n_peptides*sizeof (hbond_list 
*)))«NULL) 

out_of jmemory ( ) ; 

if ( (atom = (atom_list **) malloc (n_peptides*sizeof (atom_list 
*))) == NULL) 

out_of_memory ( ) ; 

if ((atom2 * (atom_list **) malloc (n_peptides*sizeof (atom_list 
*))) == NULL) 

out_of _memory ( ) ; 

if ( (atom_tmp * (atom_info **) malloc (njpeptides*sizeof (atom_info 
*))) 

== NULL) out_of ^memory ( ) ,- 
if ((main = (regrowth **) malloc (njpeptides*sizeof (regrowth *))) 

na NULL) 

out_of _memory { ) ; 
if ((side a (regrowth **) tiialloc (n_peptides*sizeof (regrowth *))) 
-« NULL) 

out_pf jmemory ( ) ; 
if ((bond_table = (int ***) malloc (n_peptides*sizeof (int **))) 

mm NULL) 

out_of_memory ( ) ; 
if { (n_aminq_acids s= (int *) malloc (njpeptides*sizeof (int) ) ) =- 
NULL) 

out_of jmemory ( ) ; 
if ( (n_atoms_total = (int *) malloc (njpeptides*sizeof (int) ) ) *= 
NULL) 

out_of jmemory ( ) ; 
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if ((cyclic = (logical +) malloc (n_peptides*sizeof (logical) ) ) 
NULL) 

out_of _memory ( ) ; 
if ( (n_main = (int *) malloc (n_peptides*sizeof (int) ) ) «== NULL) 

out_of _memory ( ) ; 
if ( (n_side = (int *) malloc (n_peptides*sizeof (int) ) ) » NULL) 

out_of _memory ( ) ; 
for(i=0; i<n_peptides; i++) { 

n_aminq_acids [i] s (int) strlen (sequence [i] ) ; 

} 

/* read in parameter files */ 
read_torsion_data ( ) ; 
read_l j_data ( ) ; 
read_hbond data ( ) ; 
max_atoms_per_unit = 0; 
/* read in geometric sequence information */ 
max_atoms_total = 0; 
for (i=0; i<n_peptides ; i++) { 

peptide [i] = read_peptide_data (sequence [i] , &n_atoms_total [i] , 

&max_atomsjper_unit) ; 
cyclic [i] = (n_amino_acids [i] > 1) && (sequence [i] [0] » 'C') 

SlSl 

(sequence [i] [n_amino_acids [i] -1] «' C ) ; 
if (cyclic ti] ) peptide [i] = modif y_cystine_ends (peptide [i] , 

n_amino_acids [i] , 
&n_atoms_total [i] ) ; 
if (n_atoms_total [i] >max_atoms_total) max_atoms_total = 
n_atoms_total [il ; 

n_main[i] = (cyclic [i] ) ? 2*n_amino_acids [i] + 3 
2*n_amino_acids [i] + 1; 

n_side [i] = n_amino_acids [i] ; 

} 

/* allocate sub arrays */ 
for (i-0; i<KMAX; i++) 

if ( ( t w i g [ i ] = (vector * ) 
malloc (raax_atoms_total*sizeof (vector) ) ) 
e= NULL) out_of _memory ( ) ; 
for(i«0; i<n_j>eptides ; i++) { 

if ( ( a t o m [ i ] = (atom_list * ) 
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inalloc (n_atoms_total [i] *sizeof (atom_list) ) ) 
*== NULL) out_of jmemory ( ) ; 

if ( ( a t o m 2 [ i ] = (atom_list *) 
malloc (n_atoms_total [i] *sizeof (atom_list) ) ) 
== NULL) out_of jmemory () ; 

if ((atom_tmp[i]=(atom_info *) 
malloc (n_atoms_total [i] *sizeof (atom_inf o) ) ) 
»= NULL) out_of _memory ( ) ; 
if ((main[i] = (regrowth *) 

malloc (n_main [i] *sizeof (regrowth) ) ) « NULL) 
out_of ^memory { ) ; 
if ((side[i] = (regrowth *) 

malloc (n_side [i] *sizeof (regrowth) ) ) » NULL) 
out_of_memory () ; 
if ((bond_table[i] = (int * * ) 
inalloc (n_atoms_total [i] *sizeof (int * ) ) ) 
== NULL) out_of _memory ( ) ; 
for (j«0; j<n_atoms_total [i] ; j++) 

if ( (bond^table [i] ( j ] « (int *) 
malloc (MAX_BONDS+sizeof ( int ) ) ) 

-= NULL) out_of ^memory ( ) ; 

} 

/* loop over all peptides */ 
for (i»0; i<n_peptides; i++) { 

get_main_side (peptide [i] , main[i] , side [i] , &n_main til , 
topside [i] ) ; 

/* determine connections */ 

initialize_connection_table (bond_table [i] , n_atoms_total [i] ) ; 
list_num *= 0; 

make_connection_table (bond_table [i] , &list_num, peptide ti] , 
peptide [i]) ; 

/*print_connection_table (bond_table [i] , n_atoms_total [i] ) ; */ 
-listjnum « 0; 
/* assign noncoordinate information in atom array */ 

aesign_atom_pointers ( &list_num, peptide [i] , peptide [ i] , 
atomfi] ) ; 

/* get H-bonds and torsion lists */ 

get_hbonds (&hbond[i] , atom[i] , n_atoms_total [i] ) 
/*print_hbonds (hbondti] , atom[i] ) ;*/ 
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li6t_num ■ 0; 
torsion [il = NULL; 

get_torsions (&torsion[i] , bond_table [i] , &list_num, atom[i] , 
peptide [i] , 

peptide [i] ) ; 
assign_l j_parameters (peptide [i] , peptide [i] ) ; 
/* copy noncoordinate information in atom to atom2 */ 

for (j«0; j<n_atoms_total [i] ; j++) atom2[i][j] = atom[i] [j] ; 

} 

/* do the Monte Carlo */ 

do_mc (peptide [0] , torsion[0] , hbond[0] , atom[0] , atom2 [0] , 
atom_tmp[0] , 

twig, main[0] , side[0], n_amino_acids [0] , 
n_atoms_total [01 , n_main [0] , njside [0] , cyclic [0] ) ; 
/*print_torsions (torsion [0] , atom[0] ) ;*/ 

write_car_f ile (n_amino_acids [0] , n_atoms_total [0] , atom [0] , 
"test .car") ; 

} 

#undef MAIN 



INPUT/ OUTPUT ROUTINES - PEPTIDE1.C 
♦♦A************************************************************** 

/* input /output routines */ 
# include "peptide. h" 

/* hardcoded AMBER rules have the keyword AMBER nearby 
*/ 

#define NT_CT_D I STANCE 1.4750 
#define S_S_DISTANCE 2.0380 
ttdefine P_CHARGE 0.048 
#define C_CHARGE1 -0.098 
tdefine C_CHARGE2 0.050 
#define C_CHARGE3 0.050 
ttdefine C_CHARGE4 0.824 
ttdefine C_CHARGE5 -0.405 
# define C_CHARGE6 -0.405 

/* This function is called when out of memory 
*/ 
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void out_of jmemory (void) 
{ 

printf( n Out of memory error\n n ) ; 
exit (1) ; 

} 

/* This routine returns the 1-letter amino acide sequences 
*/ 

void get_sequence (string ** sequence, int *n_peptides) 
{ 

#define SEQUENCE__LENGTH 80 
int i; 

printf ("Enter number of peptides: ") ; 
scanf ( "%d M , n_peptides) ; 

if {(^sequence * (string *) malloc (*n_peptides*sizeof (string) ) > 
NULL) 
out_pf _memory ( ) ; 
for (i=0; i<*n_peptides i++) 

if (((*sequence)[i] = (string) 
malloc (SEQUENCE_LENGTH*sizeof (char) ) ) 
=- NULL) out_of_jnemory ( ) 
for (i»0; i<*n_j?eptides; i++) { 

printf ("Enter peptide sequence %d: n f i); 
scanf ( M %s" , (* sequence) [i] ) ; 

} 

#undef SEQUENCE_LENGTH 
> 

/* read in the data files associated with this sequence 
*/ 

rigid_unit *read_peptide_data (string sequence, int *n_atoms_total , 

int *max_atoms_per unit) 

{ 

int i, n_amino_acids,- 
char name [] .dat" ; 
acid_label label ; 
rigid_unit *ul, *u2, *ret; 

/* check amino acids in sequence */ 
n_amino_acids « strl en (sequence) ; 
for(i=0; i<n amino_acids; i++) { 
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label = amino_acid_code (sequence [i] ) ,- 
if (label == BAD) { 

printf ("Invalid amino acid code %c\n", sequence [i] } ,- 

exit(l); 

} 

if (label -» P) { 

printf ("Proline not yet supported\n") ; 
exit(l) ; 

) 

} 

*n_atoms_ total = 0; 
/* add unit A */ 

label = amino_acid_code (sequence [0] ) ; 

ul = read_unit ("unit A. dat", label, 0, n_atoms_total , 
max_atoms_per_unit) ; 
ret = ul; 

for(i=0; i<n_amino_acids ; i++) { 

name[0] « sequence [i] ; 

label = amino_acid_code (sequence [i] ) ; 
/* add unit B */ 

u2 = read_unit ( "uni tB . dat 11 ( label , i , n_atoms_t otal , 
max_atoms_per_unit) ; 

u2->type «= nonCunit; 
/♦follow IUPAC naming rules if glycine */ 

if (label == G) strcpy (u2->atom[l] .name, n HAl n ); 
/* follow AMBER charge rules if alanine or proline */ 

if (label == A j j label == P) u2->atom[l] .charge = P_CHARGE; 

if <i««0) u2->head.axis = vector_scale (u2->head.axis, 
NT_CT_DI STANCE ) ; 

couple_unit (ul , u2 ) ; 

ul t= u2; 
/* add residue */ 

u2 * read_unit (name, label, i, n_atoms_total , 
max_atoms_per_unit ) ; 

couple_unit (ul , u2 ) ; 
/* add unit C or D */ 

u2 = read_unit ( (i«=n_aminq_acids-l) ? "unitD.dat" 
" unite. dat 

label, i, n_atoms_total , max_atoms_per_unit) ; 
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if (i < n_amino_acids-l) { 
/* align incoming and outgoing bonds */ 

u2->bond[0] ->tail.axis = vector_scale (u2->head.axis, 1.0); 
u2->type « Cunit; 

label = amino_acid_code (sequence [i+1] ) ; 

u2->atom [2] .residue = u2->atom [3] .residue = label; 

u2->atom[2] .residue_num = u2->atom[3] .residue_num ■= i+1; 

) 

couple_unit (ul , u2 ) ; 
ul = u2; 

} 

re turn (ret); 

} 

/* This routine reads in a rigid unit data file 
*/ 

rigid_unit *read_unit (string file, acid_label label, int 
residue_nura, 

int *n_atoms_total , int *max_atoms_per_unit) 

{ 

#define LINE_LEN 200 
FILE *fp; 

int i, j, k, il, n_rigid_units ; 

Char Stn5>l [NAME_LENGTH] , Stmp2 [NAME_LENGTH] , line [LINE_LBN] ; 
rigid_unit **utmp; 

if ((fp = fopentfile, "r")) «== NULL) { 

printf("Data file %s does not exist\n", file); 
exit (1) ; 

} 

/* read in number of rigid units */ 
getlinedine, LINE_LEN, fp) ; 
sscanf dine, "%d", &n_rigid_units) ; 
/* print f ( ■ %d\n" , n_rigid_units ) ; * / 
if ((utmp - (rigid_unit **) 

malloc (n_rigid_units*sizeof (rigid_unit *))) == NOLL) 
out_of jmemory ( ) ; 
/* allocate rigid unit */ 

for (i-0; i<n_rigid_units ; i++) { 
if <(utmp[i] - (rigid_unit *) 

malloc (sizeof (rigid_unit) ) ) «= NOLL) out_of _raemory ( ) ; 
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utnqpfi] ->type = UNKNOWN; 
getline (line, LINE_LEN, fp) ; 
sscanf (line, "*d M , &utmp[i] ->n_atoms) ; 
*n_atoms_total += utmp [i] ->n_atoms; 
if (utmp[i] ->n_atoms > *max_atoms_per_unit) 
*max_atoms_per_unit = utmpti] ->n_atoms; 
/* printf ("%d\n", utmpti] ->n_atoms) */ 
if ( (utmpti] ->atom «= (atom_info *) 

malloc (utmp [i] ->n_atoms*sizeof (atom_inf o) ) ) == NULL) 
out_of _memory ( ) ; 
/* read in atoms */ 

for(j«0; j<utmp[i] ->n_atoms; j++) { 
getline (line, LINE_LEN, fp) ; 

sscanf (line, "%s %lf %lf %lf %s %d %s %s %lf", 
utmpti] ->atom[jl .name, 
&utmp[i] ->atom[j] .position. x, 
fcutmpti] ->atom[j] . position. y, 
&utmp[i] ->atom[j] .position. z, 
&stmpl, &il, 

utmp [i] - >atom [ j ] . type , &stmp2 , 
fcutmpti] ->atom[j] .charge) ; 
/* printf <"%s %1-f %lf %lf %s %lf\n", 

utmpti] ->atom[j] .name, 
utmpti] ->atom[j] . position. x, 
utmpti] ->atom[j] . position. y, 
utmpti] ->atom[j] .position. 2, 
utmpti] ->atom[j] .type, 
utmp [i] - >atom [ j ] . charge) ; * / 
utmp [i] ->atom[j] .residue = label; 
utmp ti] ->atom[j] .res idue_num = residue_num; 

} 

} 

for (i«0; i<n_rigid_units; { 
/* allocate incoming bond vector information */ 
getline (line , LINE_LEN, f p) ; 

sscanf (line, "%d %d %d %d %d", &il, fcutmp [i] ->head.bond[0] , 
fcutmpti] ->head.bond[lJ , &utmp til ->head.bondt2] , 
&utmp[i] ->head.bond[3] ) ; 
/* printf ("%d %d %d %d %d\n",il, utmpti] ->head. bond t0] , 
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utmpti] ->head.bond[l] , utmpti] ->head.bond[2] , 
utmpti] ->head. bond t3] ) ; */ 

for (j*4; j<MAX_BONDS; j++) u trip [i} ->head. bond [j ] = -1; 

utmpti] ->head.atom_num = il; 

getline (line, LINE_LEN, fp) ; 

sscanf (line, "%lf %lf %lf", fcutmp [i] ->head.axis.x, 

&utmp[i] ->head.axis.y, 
&utmp [i] ->head.axis.z) ; 
/* printf( tt %lf %lf %lf \n", utmp ti] ->head. axis. x, 

utmpti] ->head.axis.y, 
utmpti] ->head. axis. z) ; */ 

utmpti] ->head.axis.x=utmp[i] ->atom[il] . posit ion. x-utmp[i] ->head 
axis.x; 

utmpti] ->head.axis.y=utmp[i] ->atom[il] . position. y-utmp [i] ->head 
axis .y; 

utmp I i ] - >head . axis . z =utmp [ i ] - >at om [ il ] - position . z -utmp [i ] - >head 
axis . z ; 

/* allocate outgoing bond pointers */ 
getline ( line , LINE_LEN , fp) ? 
sscanf (line, n *d w , fcutmp [i] ->n_bonds) ; 
if ( (utmpti] ->bond = (bond_type **) 

malloc (utmp [i] ->n_bonds*sizeof (bond_type *))) « NULL) 
out_of jmemory ( ) ; 
for (j«0; j<utmp[i] ->n_bonds; j++) { 
if ((utrap[i]->bondtj] = (bond_type *) 
malloc (sizeof (bond_type) ) ) «= NULL) 
out_of_raemory ( ) ; 
getline (line,LINE_LBN, fp) ; 
sscanf (line, "%d", &il) ; 
/* printf ("%d\n w ,il) ; */ 

utmpti] ->bond[j] ->next = (il«-l) ? NULL : utmptil]; 
getline (line , LINE_LEN, f p) ; 
sscanf (line, "%d %d %d %d %d", til 
fcutrapti] ->bond[j] ->tail.bond[0] , 

fcutmpli] ->bond[j] ->tail.bond[l] , 
fcutmpli] ->bond[j] ->tail.bondt2] , 
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&utmp[i] ->bond[j J ->tail .bond [3] ) ; 
/* printf("%d %d %d %d %d\n n , il, 

utmpti] ->bond[j] ->tail .bond [0] , 

utmp[i] ->bond[j] ->tail .bond [1] , 
utn?) [i] ->bond [ j] ->tail .bond [2] , 
uttnpti] ->bond[j] -> tail .bond [3] ) ;*/ 
for (k»4; k<MAX_BONDS; k++) utmp [i] ->bond [ j ] ->tail .bond [k] 

- -1; 

utrap [i] - >bond [ j ] ->tail . atom_num=: il ; 
getline (line, LINE_LEN, fp) ; 

sscanfdine, w %lf %lf %lf», &utmp[i] ->bond[j] ->tail .axis.x, 

&utmp[i] ->bond[j] ->tail .axis .y, 
&utmp [i) ->bond [ j ] ->tail . axis . z) ; 
u t m p [ i ] - >bond[j]->tail.axis.x 
utmp[i] ->atom[il] . position. x; 

utmp[i]->bond[j] -> tail . axis . y 
utmp[i] ->atom[ii] .position. y; 

utmp[±]->bond[j]->tail.axis.z 
utmpfi] ->atom[il] . position. z; 

ucmpfi] ->bond[j] ->tail.axis = 

vector_scale (utmpli] ->bond[j] ->tail .axis, 1.0) ; 

> 

} 

f close (fp) ; 

return (utmp [0] ) ; 
#undef LINE_LEN 
} 

/* This routine couples two rigid units 
*/ 

void couple_unit (rigid_unit *unitl, rigid_unit *unit2) 
{ 

bond_type **bond; 

for (bond*unitl->bond; bond[0j ->next ; bond++) ; 
bond[0] ->next = unit2; 

} 

/* This routine turns a linear CX_nC peptide into a cyclic 
disulfide-bonded peptide 

*/ 

rigid_unit *modify_cystine_ends (rigid_unit *unit, int 
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n_amino_acids , 

int *n_atoms_total ) 

{ 

int i; 

rigid_unit *unitl, *unit2, *unit3, *unit4, *unit5, *unit6; 
double len; 
vector headl, head2; 
bond_type *btmp; 
/* get new first unit */ 

unitl = unit- >bond [01 - >next ; 
unit2 « unitl->bond[0] ->next ; 
unit3 a unit2->bond[0] ->next; 
/* save head vectors */ 
headl « unitl ->head. axis; 
head2 - unit2->head.axis; 
/* modify A unit to be a side group */ 
len = vector_length (unit l->head. axis ) ; 
unit->head = unit->bond[0] ->tail; 
unit->head.axis.x *= -len; 
unit->head.axis.y *= -len; 
unit ->head. axis. z *= -len; 
unit->n_bonds ■ 0; e 
/* modify C_alpha head */ 

len = vector_length(unit2->head.axis) ; 

unitl->head ■ unitl- >bond[0] ->tail; 

unitl ->head. axis. x *= -len; 

unit l->head. axis. y *= -len; 

unitl->head.axis.z *= -len; 
/* modify Cjoeta head */ 

len m vector_length(unit3->head.axis) ; 

unit2->head = unit2->bond [0] ->tail; 

unit2->head.axis.x *■ -len; 

unit2->head.axis.y *« -len; 

unit2->head.axis.z *= -len; 
/* modify S tail */ 

unit3->bond = unit->bond; 

unit3->head.bond[2] * -1; 

unit3->bond[0] ->tail « unit3->head; 

unit3->bond[0] ->tail.axis « vector_scale (unit3->head.axis, 
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-1.0); 

unit3->bond[0] ->next = unit2; 

unit3->n_bonds = 1; 

unit 3 - >n_atoms - - ; 

( *n_atoms_total ) — ; 
/* modify S head */ 

unit3->head.axis « unit3->atom[0) . position ; 

unit3->head.axis .x -«= unit3->atom[3] .position .x ,- 

unit3->head.axis .y ~ = unit3->atom[3] .position. y; 

unit3->head.axis.z - = unit3->atom[3] .position. z ; 
/* modify C_beta tail */ 

unit2->bond[0] ->tail.axis = vector_scale(head2 # -1.0); 

unit2->bond[0] ->next = unitl; 
/* modify C_alpha tail */ 

unit 1 - >bond [ 0] ->tail. axis = vector_scale (headl, -1.0); 

unitl ->bond[0] ->next = unit; 

unit4 «= y unitl ; 
/* find last B unit */ 

for (i=l; i<n_amino_acids ; i++) { 

unit 4 = unit4->bond[unit4->n_bonds-l] ->next; 
unit4 = unit4->bond[unit4->n_bonds-l] ->next; 

} 

unit5 = unit4->bond[0] ->next; 
unit6 c unit5->bond[0] ->next; 
/* swap bond 0 and bondl for unit 4*/ 
btrap = unit4->bondt0] ; 
unit4->bond[0] = unit4->bond[l] ; 
unit4->bond[l] ■ btmp; 
/* modify S tail */ 

if ( (unite ->bond = (bond_type **) malloc(sizeof (bond_type *))) 
— NULL) 

out_of _memory ( ) ; 
if ( (unit6->bond[0] = (bond_type *) malloc (sizeof (bond_type) ) ) 
«= NULL) 

out_of jmemory ( ) ; 
unit6->head.bond[2] « -1; 
unit6->bond[0] ->tail = unit6->head; 
unit6->bond[0] ->next » unit3; 
unit6->n bonds « 1; 
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uni 1 6 - >n_a t oms - - ; 
( *n_atoms_total ) - - ; 

unit6->bond[0] ->tail.axis = unitS->atom[3] . position? 

unit6->bond[0] ->tail .axis .x - = unit6->atom[0] .position. X; 

unitS->bond[0j ->tail.axis.y units- >atom[0] .position. y; 

unit6->bond[0]->tail.axis.z units- >atom[0] .position. z; 

unit6->bond[0]->tail.axis = 
vector_scale (unit6->bond[0] ->tail.axis # 1.0) ; 
/* use AMBER S-S bond length */ 

unit3 - >head . axis = vector_scale (unit 3 - >head . axis , S_S_DISTANCE ) ; 
/* modify cystine S types to obey AMBER rules */ 

strcpy (unit 3 - >atom [ 0 ] . type , H S " ) ; 

strcpy (units ->atom[0] .type, H S H ) ; 
/* modify cystine charges to obey AMBER rules */ 



unit2- 


>atom[0] 


. charge 


= 


C_CHARGEl ; 


unit2- 


>atom(l] 


. charge 




C_CHARGE2 ; 


unit2- 


>atom[2) 


. charge 


s 


C_CHARGE3 ; 


unit3- 


>atom[0] 


. charge 


J= 


C_CHARGE4 ; 


unit3- 


>atom[l] 


. charge 


B 


C_CHARGB5 ; 


unit3- 


>atom[2) 


. charge 


=S 


C_CHARGE6; 


units- 


>atom[0] 


. charge 




C_CHARGE1; 


units- 


>atom[l] 


. charge 




CJ3IARGE2 ; 


units - 


>atom[2] 


. charge 




C_CHARGE3 ; 


units- 


>atom[0] 


. charge 


E 


C_CHARGE4 ; 


unitS- 


>atom[l] 


. charge 




C_CHARGE5 ; 


unit6- 


>atom[2] 


.charge 




C_CHARGES ; 



/* reassign first unit */ 
return (unit3) ; 

} 

/* This routine determines the main and side unit pointers 
*/ 

void get_main_side (rigid_unit *unit, regrowth *inain, regrowth 
♦side, 

int *n_main, int *n_side) 

{ 

rigidjunit *start, *unit2, *lastmain; 
regrowth *mainO; 
int i; 

mainO = main; 
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*n_side = 0; 
*n_main = 0; 
start = unit; 
lastmain « NULL; 
do { 

main->unit « unit; 
main->prev = lastmain; 
main++ ; 
(*n_main) ++; 

for (i«0; i<unit->n_bonds-l; i++) { 
unit2 - unit->bond[i] ->next ; 
if (unit2->atom[0] .residue !=G) { 
side->unit = unit2; 
side->prev = unit; 
side++; 
(*n_side)++; 

} 

} 

lastmain = unit; 

unit = unit - >bond [ i ] - >next ; 
} while (start != unit && unit->n_bonds > 0) ; 
if (unit->n_bonds »= 0) { 

main->unit «= unit; 

main->prev = lastmain; 

main++; 

(*n_main)++; 
} else { 

mainO->prev - lastmain; 

} 

} 

/* This routine reads in the torsion data file 
*/ 

void read_torsion_data (void) 
{ 

ttdefirie LINE_LEN 200 
PILE *fp; 

char line [LINE_LEN] ; 
int n_torsions, itmp, i; 
double ftrap; 
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torsion_data **data; 

if ( (fp « f open ( n torsion.dat", "r")) == NULL) { 
printf ( "Data file torsion.dat does not exist\n n ); 
exit(l) ; 

} 

get 1 ine ( 1 ine , LINE_LEN , f p ) ; 

sscanf (line, "Ird", &n_torsions) ; 

if { (torsion_data_list = (torsion_data **) 

malloc ( (n_torsions+l) *sizeof (torsion_data *) ) ) »= NULL) 
out_of _memory ( ) ; 

data = torsion_data_list; 

data [n_t or s ions] = NULL; 

for (i=0; i<n_torsions; i++) { 

if ((data [i] * (torsion_data *) malloc (sizeof (torsion_data) ) ) 
== NULL) 

out_of ^memory ( ) ; 
getlinedine, LINE_LEN, fp) ; 

sscanf (line, "%lf %d %s %s %s %s %lf %lf %lf %lf %lf %lff". 
fcftmp, &itmp, data [i] ->typel, 

data[i] ->type2, data [i] ->type3 , data[i] ->type4, 

&data[i] ->v0 [0] , &data[i] ->phi0 10] , 

&data[i]->v0[l] , &data[i] ->phi0 [1] , 

&data[i]->v0[2] , fcdata [i] ->phi0 [2] ) ; 
data[i]->phiO[0] * = PI/180.0; 
data[i]->phi0[l] *= PI/180. 0; 
data[i]->phi0[2] PI/180.0; 

} 

f close (fp) ; 
#undef LINE_LEN 

} 

/* This routine reads in the Lennard- Jones data file 
*/ 

void read_l j_data (void) 
{ 

#define LINE_LEN 200 
FILE *fp; 

char line [LINE_LEN] ; 
int n_tenns, itmp, i; 
double ftnqp; 
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lj_data **data,- 

if ((fp = fopen( M lj_param.dat", w r")) «= NULL) { 
printf ("Data file lj_param.dat does not exist\n M ); 
exit (1) ; 

} 

getlinedine, LINE_LBN, fp) ; 
sscanfdine, "%d", fcn_terms) ; 
if ( (1 j_data_list « ( 1 j _ d a t a +*) 
tnalloc ( (n_terms+l) *sizeof (lj_data *) ) ) 
ss NULL) out_of _memory ( ) ; 
data « lj_data_list; 
data [n_terms] = NULL; 
for (i=0; i<n_terms; i++) { 

if ((datati] = dj_data *) malloc (sizeof (lj_data) ) ) == NULL) 

out_of_memory { ) ; 
getlinedine, LINE_LEN, fp) ; 

sscanfdine, "%lf %d %s %lf %lf", &ftmp, fcitmp, data [i] ->type, 
&data[i] ->ri, &data[ij ->ei) ; 

} 

f close (fp) ; 
#undef LINE_LEN 
} 

/* This routine reads in the H-bond data file 
*/ 

void read_hbond_data (void) 
{ 

#define LINE_LEN 200 
FILE *fp; 

char line [LINE_LEN] ; 
int n_tenns, itnrp, i; 
double ftmp; 
hbond_data **data; 

if ((fp = fopen( "hbond.dat" , "r")) NULL) { 
printf ("Data file hbond.dat does not exist\n n ) 
exit(l); 

} 

getlinedine, LINEJLEN, fp) ; 

sscanf dine, "%d" , &n_terms) ; 

if ( (hbond_data_list « (hbond_data **) 
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malloc ( (n_terms+l) *sizeof (hbond_data *> ) ) « NULL) 
out_of _memory ( ) ; 

data = hbond_data_list; 

data [n_terms] = NULL; 

for (i*=0; i<n_terms; i++) { 

if ((data[i] = (hbond_data *) malloc (sizeof (hbond_data) ) ) 

NULL) 

out_of _memory ( ) ; 

getlinelline, LINE_LEN, fp) ; 

sscanfdine, "%lf %d %s %s %lf %lf'\ 
fcftmp, fcitmp, data[i] ->typel, 
datafi] ->type2, &data[i]->a, &data[i]->b) ; 

) 

f close (fp) ; 
#undef LINE_LBN 

} 

/* write out the BIOSYM car files associated with this sequence 
*/ 

void write_car_file (int n_amino_acids, int n_atoms_total , atom_list 
♦atom, 

string file) 



{ 



int i; 

char name [NAME_LENGTH] ; 
FILE *fp; 
time_t t,- 

if ((fp = f open (file, "w")) == NULL) { 

printfC Cannot open car file %s\n" , file); 
exit(l) ; 

} 

fprintf (fp, M ! BIOSYM archive 3\n n ) ; 
f print f (f p, "PBO0FF\n\n" ) ; 
t = time (NULL) ; 

fprintf (fp f "IDATE %s", ctimeUt)); 
for (i=0; i<n_atoms_total ; i++) { 

amino_acid_code_3 (atom[il *p->residue, name) ; 

capitalize (name) ; 

if (atom[i] .p->residue_num «= n_amino_acids-l) 
strcat (name, "N M ) ; 
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else if (atom[i] .p->residue_num «« 0) 

strcat (name # M n H ) ; 
else if (atom[i] .p->residue «= C) 

strcat (name , n H" ) ; 
fprintf(fp, "%-5s%15.9f%15.9f%15.9f %-4s %-3d %-2s 
%2c%8.3f\n", 

atomti] .p->name, 

atom[i] .position. x, atom[i) .position. y, 

atomfi] .position. z, name, atom[i] ,p->residue_num+l, 
atomfi] .p->type, 

atom[i] .p->type[0J , atom[i] .p->charge) ; 

} 

fprintf (fp, w end\nend\n") ; 
f close (fp) ; 

} 

/* this routine returns the next valid line from the file 
*/ 

string getline {string line, int len, FILE *fp) 

{ 

string ret; 
do { 

ret =f gets (line, len, fp) ; 

strip (line); 
} while (ret != NULL && *line==' \x0' ) ; 
return (ret) 

} 

/* strip CR and LF from the end of a string 
also ignore everything to the right of ! 

*/ 

void strip (string string) 
{ 

for (; *string != '\x0' *string != '\xA' && ^string != '\xD' 
&& *string 1= '!'; string**) 

i 

♦string = '\x0' ; 

} 

/* remove commas from string, replacing with spaces 
*/ 

void decomma( string string) 
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{ 

for (; *string != '\0'; string++) 
if (*string ~= ' , ' ) *string « ' ' ; 

} 

/+ This function capitalizes a string 
*/ 

void capitalize (string s) 

{ 

int o; 

o « 'a' - *A' ; 

for (; *s; s++) if (*s >= 'a' && *s <= 'z') *s -= o; 

} 

/* This function returns the 3 -letter code for the amino acid 
*/ 

void amino_acid_code_3 (acid_label label, string code_3) 

( 

switch (label) { 



case 


G: 


strcpy (code_3 # 


"Gly") , 


- break; 


case 


A: 


strcpy ( code_3 , 


"Ala") ( 


• break ; 


case 


V: 


strcpy (code_3 , 


"Val") ( 


break; 


case 


L: 


strcpy ( code_3 , 


M Leu " ) , 


• break; 


case 


I. 


strcpy ( code_3 , 


"lie") , 


• break; 


case 


S: 


strcpy (code_3, 


"Ser") ( 


■ break; 


case 


T< 


strcpy (code_3, 


"Thr") ( 


* break; 


case 


D 


strcpy (code_3, 


■Asp" ) , 


• break; 


case 


E 


: s t r cpy ( code_3 , 


"Glu" ) ( 


• break ; 


case 


N: 


: s t r cpy ( code_3 , 


" Asn n ) , 


* break; 


case 


Q< 


: strcpy ( code_3 , 


-Gin" ) , 


break; 


case 


K« 


: strcpy (code_3, 


"Lys") 


* break; 


case 


H 


: strcpy (code_3, 


"His") , 


; break; 


case 


R 


: s t rcpy ( code_3 , 


"Arg") 


- break; 


case 


F 


: strcpy (code_3, 


"Phe") , 


? break; 


case 


Y 


: s t r cpy { code_3 , 


"Tyr") 


• break; 


case 


W 


: strcpy ( code_3 , 


"Trp") 


? break; 


case 


c 


: strcpy ( code_3 , 


"Cys") 


? break; 


case 


N 


: s t rcpy ( code_3 , 


"Met") 


? break; 


case 


P 


: strcpy <code_3, 


"Pro") 


; break; 



default : strcpy (code_3, "???"); 
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} 

/* This function returns the 1-letter code for the amino acid 
*/ 

void amino_acid_code_l (acid_label label/ char code_l) 
switch (label) { 



case 


G: 


code_l 


= 


'G' ; 


break 


case 


A: 


code_l 




'A' ; 


break, 


case 


V: 


code_l 


= 


'V ; 


break, 


case 


L: 


code_l 




'L' ; 


break , 


case 


I: 


code_l 




'I' ; 


break , 


case 


S: 


code_l 




'S' ; 


break, 


case 


T: 


code_l 


K 




break; 


case 


D: 


code_l 




'D' ; 


break; 


case 


E: 


code_l 


= 


'E' ; 


breaks- 


case 


N: 


code_l 


= 


'N' ; 


break ; 


case 


Q: 


code_l 


= 


'Q' ; 


break; 


case 


K: 


code_l 




' K' ; 


break ; 


case 


H: 


code_l 




'H' ; 


break ; 


case 


R: 


code_l 




'R' ; 


break 


case 


F: 


code_l 


— 


'F' ; 


break; 


case 


Y: 


code_l 




' Y' ; 


break; 


case 


W: 


code_l 




'W ; 


break; 


case 


C: 


code_l 


s= 


'C ; 


break; 


case 


M: 


code_l 


B 


'M' ; 


break ; 


case 


P: 


code_l 




'P' ; 


break; 


default 


: code_ 


1 


« '?' 





} 



} 

/* This function returns the acid label from the 1-letter amino 

acid code 

*/ 

acid_label amino_acid_code ( char code_l) 

{ 

acid_label ret ; 
switch (code_l) { 

case 'G' : ret = G; break; 

caBe 'A' : ret « A; break; 

case 'V : ret = V; break; 
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case 


'L' 


• 


ret 




T 


break 


case 


I 




ret 




-L ; 


break 


case 


s 


■ 


ret 


ss 


o ; 


DreaK 


case 


9 T*/ 

T 


: 




es 


*r - 

I ; 


Dreax 


case 


D 


: 


rei 




u ; 


break 


case 


E 


• 


ret 


s= 


E $ 


break 


case 


W 


: 


ret 






DreaK 


case 


V 


: 


ret 




U; 


break 


case 


/ XT l 


: 


ret 


SEC 


is. ; 


break 


case 


H 




ret 




TT _ 
** 9 


break 


case 


K 


: 


ret 




K ; 


break 


case 


/ t» / 
r 




ret 


SS 


r ; 


break 


case 


y 


* 


ret 




v . 
* i 


break 


case 


'W 




ret 




W; 


break 


case 


'C 




ret 


fit 


C; 


break 


case 


'M' 




ret 




M; 


break 


case 


'P' 




ret 


SZ 


P; 


break 


default 




ret 




BAD; 



} 



return (ret) ; 

} 



MOLECULAR TOPOLOGY CREATION - PEPTIDE2.C 



/* The topology creation routines 

*/ 

# include "peptide.h" 

/* This routine initializes the bond connection table 
*/ 

void initial ize_connection_t able (int **bond_table, int 
n_atoms_total) 

{ 

int 

for(i«0; i<n_atoms_total; i++) 
for(j-0; j <MRX_B0NDS ; j++) 
bond_table[i] [j] « -1; 

} 
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